Red Hat Training

A Red Hat training course is available for Red Hat Enterprise Linux

Virtualization Deployment and Administration Guide

Red Hat Enterprise Linux 7

Installing, configuring, and managing virtual machines on a RHEL physical machine

Jiri Herrmann

Red Hat Customer Content Services

Parth Shah

Red Hat Customer Content Services

Yehuda Zimmerman

Red Hat Customer Content Services

Laura Novich

Red Hat Customer Content Services

Dayle Parker

Red Hat Customer Content Services

Scott Radvan

Red Hat Customer Content Services

Tahlia Richardson

Red Hat Customer Content Services

Abstract

This guide covers how to configure a Red Hat Enterprise Linux 7 machine to act as a virtualization host system, and how to install and configure guest virtual machines using the KVM hypervisor. Other topics include PCI device configuration, SR-IOV, networking, storage, device and guest virtual machine management, as well as troubleshooting, compatibility and restrictions. Procedures that need to be run on the guest virtual machine are explicitly marked as such.

Important

All procedures described in this guide are intended to be performed on an AMD64 or Intel 64 host machine, unless otherwise stated. For using Red Hat Enterprise Linux 7 virtualization on architectures other than AMD64 and Intel 64, see Appendix B, Using KVM Virtualization on Multiple Architectures.
For a more general introduction into virtualization solutions provided by Red Hat, see the Red Hat Enterprise Linux 7 Virtualization Getting Started Guide.

Part I. Deployment

This part provides instruction on how to install and configure a Red Hat Enterprise Linux 7 machine to act as a virtualization host system, and how to install and configure guest virtual machines using the KVM hypervisor.

Chapter 1. System Requirements

Virtualization is available with the KVM hypervisor for Red Hat Enterprise Linux 7 on the Intel 64 and AMD64 architectures. This chapter lists system requirements for running virtual machines, also referred to as VMs.
For information on installing the virtualization packages, see Chapter 2, Installing the Virtualization Packages.

1.1. Host System Requirements

Minimum host system requirements

  • 6 GB free disk space.
  • 2 GB RAM.

Recommended system requirements

  • One core or thread for each virtualized CPU and one for the host.
  • 2 GB of RAM, plus additional RAM for virtual machines.
  • 6 GB disk space for the host, plus the required disk space for the virtual machine(s).
    Most guest operating systems require at least 6 GB of disk space. Additional storage space for each guest depends on their workload.

    Swap space

    Swap space in Linux is used when the amount of physical memory (RAM) is full. If the system needs more memory resources and the RAM is full, inactive pages in memory are moved to the swap space. While swap space can help machines with a small amount of RAM, it should not be considered a replacement for more RAM. Swap space is located on hard drives, which have a slower access time than physical memory. The size of your swap partition can be calculated from the physical RAM of the host. The Red Hat Customer Portal contains an article on safely and efficiently determining the size of the swap partition: https://access.redhat.com/site/solutions/15244.
    • When using raw image files, the total disk space required is equal to or greater than the sum of the space required by the image files, the 6 GB of space required by the host operating system, and the swap space for the guest.
      For qcow images, you must also calculate the expected maximum storage requirements of the guest (total for qcow format), as qcow and qcow2 images are able to grow as required. To allow for this expansion, first multiply the expected maximum storage requirements of the guest (expected maximum guest storage) by 1.01, and add to this the space required by the host (host), and the necessary swap space (swap).
Guest virtual machine requirements are further outlined in Chapter 7, Overcommitting with KVM.

1.2. KVM Hypervisor Requirements

The KVM hypervisor requires:
  • an Intel processor with the Intel VT-x and Intel 64 virtualization extensions for x86-based systems; or
  • an AMD processor with the AMD-V and the AMD64 virtualization extensions.
Virtualization extensions (Intel VT-x or AMD-V) are required for full virtualization. Enter the following commands to determine whether your system has the hardware virtualization extensions, and that they are enabled.

Procedure 1.1. Verifying virtualization extensions

  1. Verify the CPU virtualization extensions are available

    enter the following command to verify the CPU virtualization extensions are available:
    $ grep -E 'svm|vmx' /proc/cpuinfo
  2. Analyze the output

    • The following example output contains a vmx entry, indicating an Intel processor with the Intel VT-x extension:
      flags   : fpu tsc msr pae mce cx8 vmx apic mtrr mca cmov pat pse36 clflush
      dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl
      vmx est tm2 cx16 xtpr lahf_lm
      
    • The following example output contains an svm entry, indicating an AMD processor with the AMD-V extensions:
      flags   :  fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36 clflush
      mmx fxsr sse sse2 ht syscall nx mmxext svm fxsr_opt lm 3dnowext 3dnow pni cx16
      lahf_lm cmp_legacy svm cr8legacy ts fid vid ttp tm stc
      
    If the grep -E 'svm|vmx' /proc/cpuinfo command returns any output, the processor contains the hardware virtualization extensions. In some circumstances, manufacturers disable the virtualization extensions in the BIOS. If the extensions do not appear, or full virtualization does not work, see Procedure A.3, “Enabling virtualization extensions in BIOS” for instructions on enabling the extensions in your BIOS configuration utility.
  3. Ensure the KVM kernel modules are loaded

    As an additional check, verify that the kvm modules are loaded in the kernel with the following command:
    # lsmod | grep kvm
    If the output includes kvm_intel or kvm_amd, the kvm hardware virtualization modules are loaded.

Note

The virsh utility (provided by the libvirt-client package) can output a full list of your system's virtualization capabilities with the following command:
# virsh capabilities

1.3. KVM Guest Virtual Machine Compatibility

Red Hat Enterprise Linux 7 servers have certain support limits.
The following URLs explain the processor and memory amount limitations for Red Hat Enterprise Linux:
The following URL lists guest operating systems certified to run on a Red Hat Enterprise Linux KVM host:

Note

For additional information on the KVM hypervisor's restrictions and support limits, see Appendix C, Virtualization Restrictions.

1.4. Supported Guest CPU Models

Every hypervisor has its own policy for which CPU features the guest will see by default. The set of CPU features presented to the guest by the hypervisor depends on the CPU model chosen in the guest virtual machine configuration.

1.4.1. Listing the Guest CPU Models

To view a full list of the CPU models supported for an architecture type, run the virsh cpu-models architecture command. For example:
$ virsh cpu-models x86_64
486
pentium
pentium2
pentium3
pentiumpro
coreduo
n270
core2duo
qemu32
kvm32
cpu64-rhel5
cpu64-rhel6
kvm64
qemu64
Conroe
Penryn
Nehalem
Westmere
SandyBridge
Haswell
athlon
phenom
Opteron_G1
Opteron_G2
Opteron_G3
Opteron_G4
Opteron_G5
$ virsh cpu-models ppc64
POWER7
POWER7_v2.1
POWER7_v2.3
POWER7+_v2.1
POWER8_v1.0
The full list of supported CPU models and features is contained in the cpu_map.xml file, located in /usr/share/libvirt/:
# cat /usr/share/libvirt/cpu_map.xml
A guest's CPU model and features can be changed in the <cpu> section of the domain XML file. See Section 23.12, “CPU Models and Topology” for more information.
The host model can be configured to use a specified feature set as needed. For more information, see Section 23.12.1, “Changing the Feature Set for a Specified CPU”.

Chapter 2. Installing the Virtualization Packages

To use virtualization, Red Hat virtualization packages must be installed on your computer. Virtualization packages can be installed when installing Red Hat Enterprise Linux or after installation using the yum command and the Subscription Manager application.
The KVM hypervisor uses the default Red Hat Enterprise Linux kernel with the kvm kernel module.

2.1. Installing Virtualization Packages During a Red Hat Enterprise Linux Installation

This section provides information about installing virtualization packages while installing Red Hat Enterprise Linux.

Note

For detailed information about installing Red Hat Enterprise Linux, see the Red Hat Enterprise Linux 7 Installation Guide.

Important

The Anaconda interface only offers the option to install Red Hat virtualization packages during the installation of Red Hat Enterprise Linux Server.
When installing a Red Hat Enterprise Linux Workstation, the Red Hat virtualization packages can only be installed after the workstation installation is complete. See Section 2.2, “Installing Virtualization Packages on an Existing Red Hat Enterprise Linux System”

Procedure 2.1. Installing virtualization packages

  1. Select software

    Follow the installation procedure until the Installation Summary screen.
    Image shows the Installation Summary screen, which lists configurable options under group headings. Under the Localization heading : Date & Time, Language Support, and Keyboard. Under the Software heading: Installation Source and Software Selection. Under the System heading: Installation Destination and Network & Hostname.

    Figure 2.1. The Installation Summary screen

    In the Installation Summary screen, click Software Selection. The Software Selection screen opens.
  2. Select the server type and package groups

    You can install Red Hat Enterprise Linux 7 with only the basic virtualization packages or with packages that allow management of guests through a graphical user interface. Do one of the following:
    • Install a minimal virtualization host
      Select the Virtualization Host radio button in the Base Environment pane and the Virtualization Platform check box in the Add-Ons for Selected Environment pane. This installs a basic virtualization environment which can be run with virsh or remotely over the network.
      Image shows the Software Selection screen, which lists options under two headings: Base Environment and Add-Ons for Selected Environment. Virtualization Host is highlighted from the options under Base Environment, and Virtualization Platform is highlighted from the options under Add-Ons for Selected Environment.

      Figure 2.2. Virtualization Host selected in the Software Selection screen

    • Install a virtualization host with a graphical user interface
      Select the Server with GUI radio button in the Base Environment pane and the Virtualization Client, Virtualization Hypervisor, and Virtualization Tools check boxes in the Add-Ons for Selected Environment pane. This installs a virtualization environment along with graphical tools for installing and managing guest virtual machines.
      Image shows the Software Selection screen, which lists options under two headings: Base Environment and Add-Ons for Selected Environment. Server with GUI is highlighted from the options under Base Environment, and Virtualization Client, Virtualization Hypervisor, and Virtualization Tools are highlighted from the options under Add-Ons for Selected Environment.

      Figure 2.3. Server with GUI selected in the software selection screen

  3. Finalize installation

    Click Done and continue with the installation.

Important

You need a valid Red Hat Enterprise Linux subscription to receive updates for the virtualization packages.

2.1.1. Installing KVM Packages with Kickstart Files

To use a Kickstart file to install Red Hat Enterprise Linux with the virtualization packages, append the following package groups in the %packages section of your Kickstart file:
@virtualization-hypervisor
@virtualization-client
@virtualization-platform
@virtualization-tools
For more information about installing with Kickstart files, see the Red Hat Enterprise Linux 7 Installation Guide.

2.2. Installing Virtualization Packages on an Existing Red Hat Enterprise Linux System

This section describes the steps for installing the KVM hypervisor on an existing Red Hat Enterprise Linux 7 system.
To install the packages, your machine must be registered and subscribed to the Red Hat Customer Portal. To register using Red Hat Subscription Manager, run the subscription-manager register command and follow the prompts. Alternatively, run the Red Hat Subscription Manager application from ApplicationsSystem Tools on the desktop to register.
If you do not have a valid Red Hat subscription, visit the Red Hat online store to obtain one. For more information on registering and subscribing a system to the Red Hat Customer Portal, see https://access.redhat.com/solutions/253273.

2.2.1. Installing Virtualization Packages Manually

To use virtualization on Red Hat Enterprise Linux, at minimum, you need to install the following packages:
  • qemu-kvm: This package provides the user-level KVM emulator and facilitates communication between hosts and guest virtual machines.
  • qemu-img: This package provides disk management for guest virtual machines.

    Note

    The qemu-img package is installed as a dependency of the qemu-kvm package.
  • libvirt: This package provides the server and host-side libraries for interacting with hypervisors and host systems, and the libvirtd daemon that handles the library calls, manages virtual machines, and controls the hypervisor.
To install these packages, enter the following command:
# yum install qemu-kvm libvirt
Several additional virtualization management packages are also available and are recommended when using virtualization:
  • virt-install: This package provides the virt-install command for creating virtual machines from the command line.
  • libvirt-python: This package contains a module that permits applications written in the Python programming language to use the interface supplied by the libvirt API.
  • virt-manager: This package provides the virt-manager tool, also known as Virtual Machine Manager. This is a graphical tool for administering virtual machines. It uses the libvirt-client library as the management API.
  • libvirt-client: This package provides the client-side APIs and libraries for accessing libvirt servers. The libvirt-client package includes the virsh command-line tool to manage and control virtual machines and hypervisors from the command line or a special virtualization shell.
You can install all of these recommended virtualization packages with the following command:
# yum install virt-install libvirt-python virt-manager virt-install libvirt-client

2.2.2. Installing Virtualization Package Groups

The virtualization packages can also be installed from package groups. You can view the list of available groups by running the yum grouplist hidden commad.
Out of the complete list of available package groups, the following table describes the virtualization package groups and what they provide.

Table 2.1. Virtualization Package Groups

Package Group Description Mandatory Packages Optional Packages
Virtualization Hypervisor Smallest possible virtualization host installation libvirt, qemu-kvm, qemu-img qemu-kvm-tools
Virtualization Client Clients for installing and managing virtualization instances gnome-boxes, virt-install, virt-manager, virt-viewer, qemu-img virt-top, libguestfs-tools, libguestfs-tools-c
Virtualization Platform Provides an interface for accessing and controlling virtual machines and containers libvirt, libvirt-client, virt-who, qemu-img fence-virtd-libvirt, fence-virtd-multicast, fence-virtd-serial, libvirt-cim, libvirt-java, libvirt-snmp, perl-Sys-Virt
Virtualization Tools Tools for offline virtual image management libguestfs, qemu-img libguestfs-java, libguestfs-tools, libguestfs-tools-c
To install a package group, run the yum group install package_group command. For example, to install the Virtualization Tools package group with all the package types, run:
# yum group install "Virtualization Tools" --setopt=group_package_types=mandatory,default,optional
For more information on installing package groups, see How to install a group of packages with yum on Red Hat Enterprise Linux? Knowledgebase article.

Chapter 3. Creating a Virtual Machine

After you have installed the virtualization packages on your Red Hat Enterprise Linux 7 host system, you can create virtual machines and install guest operating systems using the virt-manager interface. Alternatively, you can use the virt-install command-line utility by a list of parameters or with a script. Both methods are covered by this chapter.

3.1. Guest Virtual Machine Deployment Considerations

Various factors should be considered before creating any guest virtual machines. The role of a virtual machine should be evaluated before deployment, but regular monitoring and assessment based on variable factors (load, amount of clients) should also be performed. The factors include:
Performance
Guest virtual machines should be deployed and configured based on their intended tasks. Some guest systems (for instance, guests running a database server) may require special performance considerations. Guests may require more assigned CPUs or memory based on their role and projected system load.
Input/Output requirements and types of Input/Output
Some guest virtual machines may have a particularly high I/O requirement or may require further considerations or projections based on the type of I/O (for instance, typical disk block size access, or the amount of clients).
Storage
Some guest virtual machines may require higher priority access to storage or faster disk types, or may require exclusive access to areas of storage. The amount of storage used by guests should also be regularly monitored and taken into account when deploying and maintaining storage. Make sure to read all the considerations outlined in Red Hat Enterprise Linux 7 Virtualization Security Guide. It is also important to understand that your physical storage may limit your options in your virtual storage.
Networking and network infrastructure
Depending upon your environment, some guest virtual machines could require faster network links than other guests. Bandwidth or latency are often factors when deploying and maintaining guests, especially as requirements or load changes.
Request requirements
SCSI requests can only be issued to guest virtual machines on virtio drives if the virtio drives are backed by whole disks, and the disk device parameter is set to lun in the domain XML file, as shown in the following example:
<devices>
  <emulator>/usr/libexec/qemu-kvm</emulator>
  <disk type='block' device='lun'>
                

3.2. Creating Guests with virt-install

You can use the virt-install command to create virtual machines and install operating system on those virtual machines from the command line. virt-install can be used either interactively or as part of a script to automate the creation of virtual machines. If you are using an interactive graphical installation, you must have virt-viewer installed before you run virt-install. In addition, you can start an unattended installation of virtual machine operating systems using virt-install with kickstart files.

Note

You might need root privileges in order for some virt-install commands to complete successfully.
The virt-install utility uses a number of command-line options. However, most virt-install options are not required.
The main required options for virtual guest machine installations are:
--name
The name of the virtual machine.
--memory
The amount of memory (RAM) to allocate to the guest, in MiB.
Guest storage
Use one of the following guest storage options:
  • --disk
    The storage configuration details for the virtual machine. If you use the --disk none option, the virtual machine is created with no disk space.
  • --filesystem
    The path to the file system for the virtual machine guest.
Installation method
Use one of the following installation methods:
  • --location
    The location of the installation media.
  • --cdrom
    The file or device used as a virtual CD-ROM device. It can be path to an ISO image, or a URL from which to fetch or access a minimal boot ISO image. However, it can not be a physical host CD-ROM or DVD-ROM device.
  • --pxe
    Uses the PXE boot protocol to load the initial ramdisk and kernel for starting the guest installation process.
  • --import
    Skips the OS installation process and builds a guest around an existing disk image. The device used for booting is the first device specified by the disk or filesystem option.
  • --boot
    The post-install VM boot configuration. This option allows specifying a boot device order, permanently booting off kernel and initrd with optional kernel arguments and enabling a BIOS boot menu.
To see a complete list of options, enter the following command:
# virt-install --help
To see a complete list of attributes for an option, enter the following command:
# virt install --option=?
The virt-install man page also documents each command option, important variables, and examples.
Prior to running virt-install, you may also need to use qemu-img to configure storage options. For instructions on using qemu-img, see Chapter 14, Using qemu-img.

3.2.1. Installing a virtual machine from an ISO image

The following example installs a virtual machine from an ISO image:
# virt-install \ 
  --name guest1-rhel7 \ 
  --memory 2048 \ 
  --vcpus 2 \ 
  --disk size=8 \ 
  --cdrom /path/to/rhel7.iso \ 
  --os-variant rhel7 
The --cdrom /path/to/rhel7.iso option specifies that the virtual machine will be installed from the CD or DVD image at the specified location.

3.2.2. Importing a virtual machine image

The following example imports a virtual machine from a virtual disk image:
# virt-install \ 
  --name guest1-rhel7 \ 
  --memory 2048 \ 
  --vcpus 2 \ 
  --disk /path/to/imported/disk.qcow \ 
  --import \ 
  --os-variant rhel7 
The --import option specifies that the virtual machine will be imported from the virtual disk image specified by the --disk /path/to/imported/disk.qcow option.

3.2.3. Installing a virtual machine from the network

The following example installs a virtual machine from a network location:
# virt-install \ 
  --name guest1-rhel7 \ 
  --memory 2048 \ 
  --vcpus 2 \ 
  --disk size=8 \ 
  --location http://example.com/path/to/os \ 
  --os-variant rhel7 
The --location http://example.com/path/to/os option specifies that the installation tree is at the specified network location.

3.2.4. Installing a virtual machine using PXE

When installing a virtual machine using the PXE boot protocol, both the --network option specifying a bridged network and the --pxe option must be specified.
The following example installs a virtual machine using PXE:
# virt-install \ 
  --name guest1-rhel7 \ 
  --memory 2048 \ 
  --vcpus 2 \ 
  --disk size=8 \ 
  --network=bridge:br0 \ 
  --pxe \ 
  --os-variant rhel7 

3.2.5. Installing a virtual machine with Kickstart

The following example installs a virtual machine using a kickstart file:
# virt-install \ 
  --name guest1-rhel7 \ 
  --memory 2048 \ 
  --vcpus 2 \ 
  --disk size=8 \ 
  --location http://example.com/path/to/os \ 
  --os-variant rhel7 \
  --initrd-inject /path/to/ks.cfg \ 
  --extra-args="ks=file:/ks.cfg console=tty0 console=ttyS0,115200n8" 
The initrd-inject and the extra-args options specify that the virtual machine will be installed using a Kickstarter file.

3.2.6. Configuring the guest virtual machine network during guest creation

When creating a guest virtual machine, you can specify and configure the network for the virtual machine. This section provides the options for each of the guest virtual machine main network types.

Default network with NAT

The default network uses libvirtd's network address translation (NAT) virtual network switch. For more information about NAT, see Section 6.1, “Network Address Translation (NAT) with libvirt”.
Before creating a guest virtual machine with the default network with NAT, ensure that the libvirt-daemon-config-network package is installed.
To configure a NAT network for the guest virtual machine, use the following option for virt-install:
--network default

Note

If no network option is specified, the guest virtual machine is configured with a default network with NAT.

Bridged network with DHCP

When configured for bridged networking, the guest uses an external DHCP server. This option should be used if the host has a static networking configuration and the guest requires full inbound and outbound connectivity with the local area network (LAN). It should be used if live migration will be performed with the guest virtual machine. To configure a bridged network with DHCP for the guest virtual machine, use the following option:
--network br0

Note

The bridge must be created separately, prior to running virt-install. For details on creating a network bridge, see Section 6.4.1, “Configuring Bridged Networking on a Red Hat Enterprise Linux 7 Host”.

Bridged network with a static IP address

Bridged networking can also be used to configure the guest to use a static IP address. To configure a bridged network with a static IP address for the guest virtual machine, use the following options:
--network br0 \
--extra-args "ip=192.168.1.2::192.168.1.1:255.255.255.0:test.example.com:eth0:none"
For more information on network booting options, see the Red Hat Enterprise Linux 7 Installation Guide.

No network

To configure a guest virtual machine with no network interface, use the following option:
--network=none

3.3. Creating Guests with virt-manager

The Virtual Machine Manager, also known as virt-manager, is a graphical tool for creating and managing guest virtual machines.
This section covers how to install a Red Hat Enterprise Linux 7 guest virtual machine on a Red Hat Enterprise Linux 7 host using virt-manager.
These procedures assume that the KVM hypervisor and all other required packages are installed and the host is configured for virtualization. For more information on installing the virtualization packages, see Chapter 2, Installing the Virtualization Packages.

3.3.1. virt-manager installation overview

The New VM wizard breaks down the virtual machine creation process into five steps:
  1. Choosing the hypervisor and installation type
  2. Locating and configuring the installation media
  3. Configuring memory and CPU options
  4. Configuring the virtual machine's storage
  5. Configuring virtual machine name, networking, architecture, and other hardware settings
Ensure that virt-manager can access the installation media (whether locally or over the network) before you continue.

3.3.2. Creating a Red Hat Enterprise Linux 7 Guest with virt-manager

This procedure covers creating a Red Hat Enterprise Linux 7 guest virtual machine with a locally stored installation DVD or DVD image. Red Hat Enterprise Linux 7 DVD images are available from the Red Hat Customer Portal.

Note

If you wish to install a virtual machine with SecureBoot enabled, see Creating a SecureBoot Red Hat Enterprise Linux 7 Guest with virt-manager.

Procedure 3.1. Creating a Red Hat Enterprise Linux 7 guest virtual machine with virt-manager using local installation media

  1. Optional: Preparation

    Prepare the storage environment for the virtual machine. For more information on preparing storage, see Chapter 13, Managing Storage for Virtual Machines.

    Important

    Various storage types may be used for storing guest virtual machines. However, for a virtual machine to be able to use migration features, the virtual machine must be created on networked storage.
    Red Hat Enterprise Linux 7 requires at least 1 GB of storage space. However, Red Hat recommends at least 5 GB of storage space for a Red Hat Enterprise Linux 7 installation and for the procedures in this guide.
  2. Open virt-manager and start the wizard

    Open virt-manager by executing the virt-manager command as root or opening ApplicationsSystem ToolsVirtual Machine Manager.
    The Virtual Machine Manager window

    Figure 3.1. The Virtual Machine Manager window

    Optionally, open a remote hypervisor by selecting the hypervisor and clicking the Connect button.
    Click to start the new virtualized guest wizard.
    The New VM window opens.
  3. Specify installation type

    Select the installation type:
    Local install media (ISO image or CDROM)
    This method uses an image of an installation disk (for example, .iso). However, using a host CD-ROM or a DVD-ROM device is not possible.
    Network Install (HTTP, FTP, or NFS)
    This method involves the use of a mirrored Red Hat Enterprise Linux or Fedora installation tree to install a guest. The installation tree must be accessible through either HTTP, FTP, or NFS.
    If you select Network Install, provide the installation URL and also Kernel options, if required.
    Network Boot (PXE)
    This method uses a Preboot eXecution Environment (PXE) server to install the guest virtual machine. Setting up a PXE server is covered in the Red Hat Enterprise Linux 7 Installation Guide. To install using network boot, the guest must have a routable IP address or shared network device.
    If you select Network Boot, continue to STEP 5. After all steps are completed, a DHCP request is sent and if a valid PXE server is found the guest virtual machine's installation processes will start.
    Import existing disk image
    This method allows you to create a new guest virtual machine and import a disk image (containing a pre-installed, bootable operating system) to it.
    Virtual machine installation method

    Figure 3.2. Virtual machine installation method

    Click Forward to continue.
  4. Select the installation source

    1. If you selected Local install media (ISO image or CDROM), specify your intended local installation media.
      Local ISO image installation

      Figure 3.3. Local ISO image installation

      Warning

      Even though the option is currently present in the GUI, installing from a physical CD-ROM or DVD device on the host is not possible. Therefore, selecting the Use CDROM or DVD option will cause the VM installation to fail. For details, see the Red Hat Knowledge Base.
      To install from an ISO image, select Use ISO image and click the Browse... button to open the Locate media volume window.
      Select the installation image you wish to use, and click Choose Volume.
      If no images are displayed in the Locate media volume window, click the Browse Local button to browse the host machine for the installation image or DVD drive containing the installation disk. Select the installation image or DVD drive containing the installation disk and click Open; the volume is selected for use and you are returned to the Create a new virtual machine wizard.

      Important

      For ISO image files and guest storage images, the recommended location to use is /var/lib/libvirt/images/. Any other location may require additional configuration by SELinux. See the Red Hat Enterprise Linux Virtualization Security Guide or the Red Hat Enterprise Linux SELinux User's and Administrator's Guide for more details on configuring SELinux.
    2. If you selected Network Install, input the URL of the installation source and also the required Kernel options, if any. The URL must point to the root directory of an installation tree, which must be accessible through either HTTP, FTP, or NFS.
      To perform a kickstart installation, specify the URL of a kickstart file in Kernel options, starting with ks=
      Network kickstart installation

      Figure 3.4. Network kickstart installation

      Note

      For a complete list of kernel options, see the Red Hat Enterprise Linux 7 Installation Guide.
    Next, configure the OS type and Version of the installation. Ensure that you select the appropriate operating system type for your virtual machine. This can be specified manually or by selecting the Automatically detect operating system based on install media check box.
    Click Forward to continue.
  5. Configure memory (RAM) and virtual CPUs

    Specify the number of CPUs and amount of memory (RAM) to allocate to the virtual machine. The wizard shows the number of CPUs and amount of memory you can allocate; these values affect the host's and guest's performance.
    Virtual machines require sufficient physical memory (RAM) to run efficiently and effectively. Red Hat supports a minimum of 512MB of RAM for a virtual machine. Red Hat recommends at least 1024MB of RAM for each logical core.
    Assign sufficient virtual CPUs for the virtual machine. If the virtual machine runs a multi-threaded application, assign the number of virtual CPUs the guest virtual machine will require to run efficiently.
    You cannot assign more virtual CPUs than there are physical processors (or hyper-threads) available on the host system. The number of virtual CPUs available is noted in the Up to X available field.
    Configuring Memory and CPU

    Figure 3.5. Configuring Memory and CPU

    After you have configured the memory and CPU settings, click Forward to continue.

    Note

    Memory and virtual CPUs can be overcommitted. For more information on overcommitting, see Chapter 7, Overcommitting with KVM.
  6. Configure storage

    Enable and assign sufficient space for your virtual machine and any applications it requires. Assign at least 5 GB for a desktop installation or at least 1 GB for a minimal installation.
    Configuring virtual storage

    Figure 3.6. Configuring virtual storage

    Note

    Live and offline migrations require virtual machines to be installed on shared network storage. For information on setting up shared storage for virtual machines, see Section 15.4, “Shared Storage Example: NFS for a Simple Migration”.
    1. With the default local storage

      Select the Create a disk image on the computer's hard drive radio button to create a file-based image in the default storage pool, the /var/lib/libvirt/images/ directory. Enter the size of the disk image to be created. If the Allocate entire disk now check box is selected, a disk image of the size specified will be created immediately. If not, the disk image will grow as it becomes filled.

      Note

      Although the storage pool is a virtual container it is limited by two factors: maximum size allowed to it by qemu-kvm and the size of the disk on the host physical machine. Storage pools may not exceed the size of the disk on the host physical machine. The maximum sizes are as follows:
      • virtio-blk = 2^63 bytes or 8 Exabytes(using raw files or disk)
      • Ext4 = ~ 16 TB (using 4 KB block size)
      • XFS = ~8 Exabytes
      • qcow2 and host file systems keep their own metadata and scalability should be evaluated/tuned when trying very large image sizes. Using raw disks means fewer layers that could affect scalability or max size.
      Click Forward to create a disk image on the local hard drive. Alternatively, select Select managed or other existing storage, then select Browse to configure managed storage.
    2. With a storage pool

      If you select Select managed or other existing storage to use a storage pool, click Browse to open the Locate or create storage volume window.
      The Choose Storage Volume window

      Figure 3.7. The Choose Storage Volume window

      1. Select a storage pool from the Storage Pools list.
      2. Optional: Click to create a new storage volume. The Add a Storage Volume screen will appear. Enter the name of the new storage volume.
        Choose a format option from the Format drop-down menu. Format options include raw, qcow2, and qed. Adjust other fields as needed. Note that the qcow2 version used here is version 3. To change the qcow version see Section 23.19.2, “Setting Target Elements”
        The Add a Storage Volume window

        Figure 3.8. The Add a Storage Volume window

    Select the new volume and click Choose volume. Next, click Finish to return to the New VM wizard. Click Forward to continue.
  7. Name and final configuration

    Name the virtual machine. Virtual machine names can contain letters, numbers and the following characters: underscores (_), periods (.), and hyphens (-). Virtual machine names must be unique for migration and cannot consist only of numbers.
    By default, the virtual machine will be created with network address translation (NAT) for a network called 'default' . To change the network selection, click Network selection and select a host device and source mode.
    Verify the settings of the virtual machine and click Finish when you are satisfied; this will create the virtual machine with specified networking settings, virtualization type, and architecture.
    Verifying the configuration

    Figure 3.9. Verifying the configuration

    Or, to further configure the virtual machine's hardware, check the Customize configuration before install check box to change the guest's storage or network devices, to use the paravirtualized (virtio) drivers or to add additional devices. This opens another wizard that will allow you to add, remove, and configure the virtual machine's hardware settings.

    Note

    Red Hat Enterprise Linux 4 or Red Hat Enterprise Linux 5 guest virtual machines cannot be installed using graphical mode. As such, you must select "Cirrus" instead of "QXL" as a video card.
    After configuring the virtual machine's hardware, click Apply. virt-manager will then create the virtual machine with your specified hardware settings.

    Warning

    When installing a Red Hat Enterprise Linux 7 guest virtual machine from a remote medium but without a configured TCP/IP connection, the installation fails. However, when installing a guest virtual machine of Red Hat Enterprise Linux 5 or 6 in such circumstances, the installer opens a "Configure TCP/IP" interface.
    For further information about this difference, see the related knowledgebase article.
    Click Finish to continue into the Red Hat Enterprise Linux installation sequence. For more information on installing Red Hat Enterprise Linux 7, see the Red Hat Enterprise Linux 7 Installation Guide.
A Red Hat Enterprise Linux 7 guest virtual machine is now created from an ISO installation disk image.

3.4. Comparison of virt-install and virt-manager Installation options

This table provides a quick reference to compare equivalent virt-install and virt-manager installation options for when installing a virtual machine.
Most virt-install options are not required. The minimum requirements are --name, --memory, guest storage (--disk, --filesystem or --disk none), and an install method (--location, --cdrom, --pxe, --import, or boot). These options are further specified with arguments; to see a complete list of command options and related arguments, enter the following command:
# virt-install --help
In virt-manager, at minimum, a name, installation method, memory (RAM), vCPUs, storage are required.

Table 3.1. virt-install and virt-manager configuration comparison for guest installations

Configuration on virtual machine virt-install option virt-manager installation wizard label and step number
Virtual machine name --name, -n Name (step 5)
RAM to allocate (MiB) --ram, -r Memory (RAM) (step 3)
Storage - specify storage media --disk Enable storage for this virtual machine → Create a disk image on the computer's hard drive, or Select managed or other existing storage (step 4)
Storage - export a host directory to the guest --filesystem Enable storage for this virtual machine → Select managed or other existing storage (step 4)
Storage - configure no local disk storage on the guest --nodisks Deselect the Enable storage for this virtual machine check box (step 4)
Installation media location (local install) --file Local install media → Locate your install media (steps 1-2)
Installation using a distribution tree (network install) --location Network install → URL (steps 1-2)
Install guest with PXE --pxe Network boot (step 1)
Number of vCPUs --vcpus CPUs (step 3)
Host network --network Advanced options drop-down menu (step 5)
Operating system variant/version --os-variant Version (step 2)
Graphical display method --graphics, --nographics * virt-manager provides GUI installation only

Chapter 4. Cloning Virtual Machines

There are two types of guest virtual machine instances used in creating guest copies:
  • Clones are instances of a single virtual machine. Clones can be used to set up a network of identical virtual machines, and they can also be distributed to other destinations.
  • Templates are instances of a virtual machine that are designed to be used as a source for cloning. You can create multiple clones from a template and make minor modifications to each clone. This is useful in seeing the effects of these changes on the system.
Both clones and templates are virtual machine instances. The difference between them is in how they are used.
For the created clone to work properly, information and configurations unique to the virtual machine that is being cloned usually has to be removed before cloning. The information that needs to be removed differs, based on how the clones will be used.
The information and configurations to be removed may be on any of the following levels:
  • Platform level information and configurations include anything assigned to the virtual machine by the virtualization solution. Examples include the number of Network Interface Cards (NICs) and their MAC addresses.
  • Guest operating system level information and configurations include anything configured within the virtual machine. Examples include SSH keys.
  • Application level information and configurations include anything configured by an application installed on the virtual machine. Examples include activation codes and registration information.

    Note

    This chapter does not include information about removing the application level, because the information and approach is specific to each application.
As a result, some of the information and configurations must be removed from within the virtual machine, while other information and configurations must be removed from the virtual machine using the virtualization environment (for example, Virtual Machine Manager or VMware).

Note

For information on cloning storage volumes, see Section 13.3.2.1, “Creating Storage Volumes with virsh”.

4.1. Preparing Virtual Machines for Cloning

Before cloning a virtual machine, it must be prepared by running the virt-sysprep utility on its disk image, or by using the following steps:

Procedure 4.1. Preparing a virtual machine for cloning

  1. Setup the virtual machine

    1. Build the virtual machine that is to be used for the clone or template.
      • Install any software needed on the clone.
      • Configure any non-unique settings for the operating system.
      • Configure any non-unique application settings.
  2. Remove the network configuration

    1. Remove any persistent udev rules using the following command:
      # rm -f /etc/udev/rules.d/70-persistent-net.rules

      Note

      If udev rules are not removed, the name of the first NIC may be eth1 instead of eth0.
    2. Remove unique network details from ifcfg scripts by making the following edits to /etc/sysconfig/network-scripts/ifcfg-eth[x]:
      1. Remove the HWADDR and Static lines

        Note

        If the HWADDR does not match the new guest's MAC address, the ifcfg will be ignored. Therefore, it is important to remove the HWADDR from the file.
        DEVICE=eth[x]
        BOOTPROTO=none
        ONBOOT=yes
        #NETWORK=10.0.1.0       <- REMOVE
        #NETMASK=255.255.255.0  <- REMOVE
        #IPADDR=10.0.1.20       <- REMOVE
        #HWADDR=xx:xx:xx:xx:xx  <- REMOVE
        #USERCTL=no             <- REMOVE
        # Remove any other *unique* or non-desired settings, such as UUID.
        
      2. Ensure that a DHCP configuration remains that does not include a HWADDR or any unique information.
        DEVICE=eth[x]
        BOOTPROTO=dhcp
        ONBOOT=yes
        
      3. Ensure that the file includes the following lines:
        DEVICE=eth[x]
        ONBOOT=yes
        
    3. If the following files exist, ensure that they contain the same content:
      • /etc/sysconfig/networking/devices/ifcfg-eth[x]
      • /etc/sysconfig/networking/profiles/default/ifcfg-eth[x]

      Note

      If NetworkManager or any special settings were used with the virtual machine, ensure that any additional unique information is removed from the ifcfg scripts.
  3. Remove registration details

    1. Remove registration details using one of the following:
      • For Red Hat Network (RHN) registered guest virtual machines, use the following command:
        # rm /etc/sysconfig/rhn/systemid
      • For Red Hat Subscription Manager (RHSM) registered guest virtual machines:
        • If the original virtual machine will not be used, use the following commands:
          # subscription-manager unsubscribe --all
          # subscription-manager unregister
          # subscription-manager clean
        • If the original virtual machine will be used, run only the following command:
          # subscription-manager clean
          The original RHSM profile remains in the Portal. To reactivate your RHSM registration on the virtual machine after it is cloned, do the following:
          1. Obtain your customer identity code:
            # subscription-manager identity
            subscription-manager identity: 71rd64fx-6216-4409-bf3a-e4b7c7bd8ac9
            
          2. Register the virtual machine using the obtained ID code:
            # subscription-manager register --consumerid=71rd64fx-6216-4409-bf3a-e4b7c7bd8ac9
  4. Removing other unique details

    1. Remove any sshd public/private key pairs using the following command:
      # rm -rf /etc/ssh/ssh_host_*

      Note

      Removing ssh keys prevents problems with ssh clients not trusting these hosts.
    2. Remove any other application-specific identifiers or configurations that may cause conflicts if running on multiple machines.
  5. Configure the virtual machine to run configuration wizards on the next boot

    1. Configure the virtual machine to run the relevant configuration wizards the next time it is booted by doing one of the following:
      • For Red Hat Enterprise Linux 6 and below, create an empty file on the root file system called .unconfigured using the following command:
        # touch /.unconfigured
      • For Red Hat Enterprise Linux 7, enable the first boot and initial-setup wizards by running the following commands:
        # sed -ie 's/RUN_FIRSTBOOT=NO/RUN_FIRSTBOOT=YES/' /etc/sysconfig/firstboot
        # systemctl enable firstboot-graphical
        # systemctl enable initial-setup-graphical

      Note

      The wizards that run on the next boot depend on the configurations that have been removed from the virtual machine. In addition, on the first boot of the clone, it is recommended that you change the hostname.

4.2. Cloning a Virtual Machine

Before proceeding with cloning, shut down the virtual machine. You can clone the virtual machine using virt-clone or virt-manager.

4.2.1. Cloning Guests with virt-clone

You can use virt-clone to clone virtual machines from the command line.
Note that you need root privileges for virt-clone to complete successfully.
The virt-clone command provides a number of options that can be passed on the command line. These include general options, storage configuration options, networking configuration options, and miscellaneous options. Only the --original is required. To see a complete list of options, enter the following command:
# virt-clone --help
The virt-clone man page also documents each command option, important variables, and examples.
The following example shows how to clone a guest virtual machine called "demo" on the default connection, automatically generating a new name and disk clone path.

Example 4.1. Using virt-clone to clone a guest

# virt-clone --original demo --auto-clone
The following example shows how to clone a QEMU guest virtual machine called "demo" with multiple disks.

Example 4.2. Using virt-clone to clone a guest

# virt-clone --connect qemu:///system --original demo --name newdemo --file /var/lib/libvirt/images/newdemo.img --file /var/lib/libvirt/images/newdata.img

4.2.2. Cloning Guests with virt-manager

This procedure describes cloning a guest virtual machine using the virt-manager utility.

Procedure 4.2. Cloning a Virtual Machine with virt-manager

  1. Open virt-manager

    Start virt-manager. Launch the Virtual Machine Manager application from the Applications menu and System Tools submenu. Alternatively, run the virt-manager command as root.
    Select the guest virtual machine you want to clone from the list of guest virtual machines in Virtual Machine Manager.
    Right-click the guest virtual machine you want to clone and select Clone. The Clone Virtual Machine window opens.
    Clone Virtual Machine window

    Figure 4.1. Clone Virtual Machine window

  2. Configure the clone

    • To change the name of the clone, enter a new name for the clone.
    • To change the networking configuration, click Details.
      Enter a new MAC address for the clone.
      Click OK.
      Change MAC Address window

      Figure 4.2. Change MAC Address window

    • For each disk in the cloned guest virtual machine, select one of the following options:
      • Clone this disk - The disk will be cloned for the cloned guest virtual machine
      • Share disk with guest virtual machine name - The disk will be shared by the guest virtual machine that will be cloned and its clone
      • Details - Opens the Change storage path window, which enables selecting a new path for the disk
        Change storage path window

        Figure 4.3. Change storage path window

  3. Clone the guest virtual machine

    Click Clone.

Chapter 5. KVM Paravirtualized (virtio) Drivers

Paravirtualized drivers enhance the performance of guests, decreasing guest I/O latency and increasing throughput almost to bare-metal levels. It is recommended to use the paravirtualized drivers for fully virtualized guests running I/O-heavy tasks and applications.
Virtio drivers are KVM's paravirtualized device drivers, available for guest virtual machines running on KVM hosts. These drivers are included in the virtio package. The virtio package supports block (storage) devices and network interface controllers.

Note

PCI devices are limited by the virtualized system architecture. See Chapter 16, Guest Virtual Machine Device Configuration for additional limitations when using assigned devices.

5.1. Using KVM virtio Drivers for Existing Storage Devices

You can modify an existing hard disk device attached to the guest to use the virtio driver instead of the virtualized IDE driver. The example shown in this section edits libvirt configuration files. Note that the guest virtual machine does not need to be shut down to perform these steps, however the change will not be applied until the guest is completely shut down and rebooted.

Procedure 5.1. Using KVM virtio drivers for existing devices

  1. Ensure that you have installed the appropriate driver (viostor), before continuing with this procedure.
  2. Run the virsh edit guestname command as root to edit the XML configuration file for your device. For example, virsh edit guest1. The configuration files are located in the /etc/libvirt/qemu/ directory.
  3. Below is a file-based block device using the virtualized IDE driver. This is a typical entry for a virtual machine not using the virtio drivers.
    <disk type='file' device='disk'>
    	 ...
       <source file='/var/lib/libvirt/images/disk1.img'/>
       <target dev='hda' bus='ide'/>
    	 <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </disk>
  4. Change the entry to use the virtio device by modifying the bus= entry to virtio. Note that if the disk was previously IDE, it has a target similar to hda, hdb, or hdc. When changing to bus=virtio the target needs to be changed to vda, vdb, or vdc accordingly.
    <disk type='file' device='disk'>
       ...
       <source file='/var/lib/libvirt/images/disk1.img'/>
       <target dev='vda' bus='virtio'/>
    	 <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </disk>
  5. Remove the address tag inside the disk tags. This must be done for this procedure to work. Libvirt will regenerate the address tag appropriately the next time the virtual machine is started.
Alternatively, virt-manager, virsh attach-disk or virsh attach-interface can add a new device using the virtio drivers.
See the libvirt website for more details on using Virtio: http://www.linux-kvm.org/page/Virtio

5.2. Using KVM virtio Drivers for New Storage Devices

This procedure covers creating new storage devices using the KVM virtio drivers with virt-manager.
Alternatively, the virsh attach-disk or virsh attach-interface commands can be used to attach devices using the virtio drivers.

Important

Ensure the drivers have been installed on the guest before proceeding to install new devices. If the drivers are unavailable the device will not be recognized and will not work.

Procedure 5.2. Adding a storage device using the virtio storage driver

  1. Open the guest virtual machine by double clicking the name of the guest in virt-manager.
  2. Open the Show virtual hardware details tab by clicking .
  3. In the Show virtual hardware details tab, click the Add Hardware button.
  4. Select hardware type

    Select Storage as the Hardware type.
    The Add new virtual hardware wizard with Storage selected as the hardware type.

    Figure 5.1. The Add new virtual hardware wizard

  5. Select the storage device and driver

    Create a new disk image or select a storage pool volume.
    Set the Device type to Disk device and the Bus type to VirtIO to use the virtio drivers.
    The Add new virtual hardware wizard Storage window, with "Create a disk image on the computer's hard drive" selected.

    Figure 5.2. The Add New Virtual Hardware wizard

    Click Finish to complete the procedure.

Procedure 5.3. Adding a network device using the virtio network driver

  1. Open the guest virtual machine by double clicking the name of the guest in virt-manager.
  2. Open the Show virtual hardware details tab by clicking .
  3. In the Show virtual hardware details tab, click the Add Hardware button.
  4. Select hardware type

    Select Network as the Hardware type.
    The Add new virtual hardware wizard with Network selected as the hardware type.

    Figure 5.3. The Add new virtual hardware wizard

  5. Select the network device and driver

    Set the Device model to virtio to use the virtio drivers. Choose the required Host device.
    The Add new virtual hardware wizard Network window, with Device model set to virtio.

    Figure 5.4. The Add new virtual hardware wizard

    Click Finish to complete the procedure.
Once all new devices are added, reboot the virtual machine. Virtual machines may not recognize the devices until the guest is rebooted.

5.3. Using KVM virtio Drivers for Network Interface Devices

When network interfaces use KVM virtio drivers, KVM does not emulate networking hardware which removes processing overhead and can increase the guest performance. In Red Hat Enterprise Linux 7, virtio is used as the default network interface type. However, if this is configured differently on your system, you can use the following procedures:
  • To attach a virtio network device to a guest, use the virsh attach-interface command with the model --virtio option.
    Alternatively, in the virt-manager interface, navigate to the guest's Virtual hardware details screen and click Add Hardware. In the Add New Virtual Hardware screen, select Network, and change Device model to virtio:
  • To change the type of an existing interface to virtio, use the virsh edit command to edit the XML configuration of the intended guest, and change the model type attribute to virtio, for example as follows:
      <devices>
        <interface type='network'>
          <source network='default'/>
          <target dev='vnet1'/>
          <model type='virtio'/>
          <driver name='vhost' txmode='iothread' ioeventfd='on' event_idx='off'/>
        </interface>
      </devices>
      ...
    Alternatively, in the virt-manager interface, navigate to the guest's Virtual hardware details screen, select the NIC item, and change Device model to virtio:

Note

If the naming of network interfaces inside the guest is not consistent across reboots, ensure all interfaces presented to the guest are of the same device model, preferably virtio-net. For details, see the Red Hat Knowledgebase.

Chapter 6. Network Configuration

This chapter provides an introduction to the common networking configurations used by libvirt-based guest virtual machines.
Red Hat Enterprise Linux 7 supports the following networking setups for virtualization:
  • virtual networks using Network Address Translation (NAT)
  • directly allocated physical devices using PCI device assignment
  • directly allocated virtual functions using PCIe SR-IOV
  • bridged networks
You must enable NAT, network bridging or directly assign a PCI device to allow external hosts access to network services on guest virtual machines.

6.1. Network Address Translation (NAT) with libvirt

One of the most common methods for sharing network connections is to use Network Address Translation (NAT) forwarding (also known as virtual networks).
Host Configuration

Every standard libvirt installation provides NAT-based connectivity to virtual machines as the default virtual network. Verify that it is available with the virsh net-list --all command.

# virsh net-list --all
Name                 State      Autostart
-----------------------------------------
default              active     yes
If it is missing, the following can be used in the XML configuration file (such as /etc/libvirtd/qemu/myguest.xml) for the guest:
# ll /etc/libvirt/qemu/
total 12
drwx------. 3 root root 4096 Nov  7 23:02 networks
-rw-------. 1 root root 2205 Nov 20 01:20 r6.4.xml
-rw-------. 1 root root 2208 Nov  8 03:19 r6.xml
The default network is defined from /etc/libvirt/qemu/networks/default.xml
Mark the default network to automatically start:
# virsh net-autostart default
Network default marked as autostarted
Start the default network:
# virsh net-start default
Network default started
Once the libvirt default network is running, you will see an isolated bridge device. This device does not have any physical interfaces added. The new device uses NAT and IP forwarding to connect to the physical network. Do not add new interfaces.
# brctl show
bridge name     bridge id               STP enabled     interfaces
virbr0          8000.000000000000       yes
libvirt adds iptables rules which allow traffic to and from guest virtual machines attached to the virbr0 device in the INPUT, FORWARD, OUTPUT and POSTROUTING chains. libvirt then attempts to enable the ip_forward parameter. Some other applications may disable ip_forward, so the best option is to add the following to /etc/sysctl.conf.
 net.ipv4.ip_forward = 1
Guest Virtual Machine Configuration

Once the host configuration is complete, a guest virtual machine can be connected to the virtual network based on its name. To connect a guest to the 'default' virtual network, the following can be used in the XML configuration file (such as /etc/libvirtd/qemu/myguest.xml) for the guest:

<interface type='network'>
   <source network='default'/>
</interface>

Note

Defining a MAC address is optional. If you do not define one, a MAC address is automatically generated and used as the MAC address of the bridge device used by the network. Manually setting the MAC address may be useful to maintain consistency or easy reference throughout your environment, or to avoid the very small chance of a conflict.
<interface type='network'>
  <source network='default'/>
  <mac address='00:16:3e:1a:b3:4a'/>
</interface>

6.2. Disabling vhost-net

The vhost-net module is a kernel-level back end for virtio networking that reduces virtualization overhead by moving virtio packet processing tasks out of user space (the QEMU process) and into the kernel (the vhost-net driver). vhost-net is only available for virtio network interfaces. If the vhost-net kernel module is loaded, it is enabled by default for all virtio interfaces, but can be disabled in the interface configuration if a particular workload experiences a degradation in performance when vhost-net is in use.
Specifically, when UDP traffic is sent from a host machine to a guest virtual machine on that host, performance degradation can occur if the guest virtual machine processes incoming data at a rate slower than the host machine sends it. In this situation, enabling vhost-net causes the UDP socket's receive buffer to overflow more quickly, which results in greater packet loss. It is therefore better to disable vhost-net in this situation to slow the traffic, and improve overall performance.
To disable vhost-net, edit the <interface> sub-element in the guest virtual machine's XML configuration file and define the network as follows:
<interface type="network">
   ...
   <model type="virtio"/>
   <driver name="qemu"/>
   ...
</interface>
Setting the driver name to qemu forces packet processing into QEMU user space, effectively disabling vhost-net for that interface.

6.3. Enabling vhost-net zero-copy

In Red Hat Enterprise Linux 7, vhost-net zero-copy is disabled by default. To enable this action on a permanent basis, add a new file vhost-net.conf to /etc/modprobe.d with the following content:
options vhost_net  experimental_zcopytx=1
If you want to disable this again, you can run the following:
modprobe -r vhost_net
modprobe vhost_net experimental_zcopytx=0
The first command removes the old file, the second one makes a new file (like above) and disables zero-copy. You can use this to enable as well but the change will not be permanent.
To confirm that this has taken effect, check the output of cat /sys/module/vhost_net/parameters/experimental_zcopytx. It should show:
$ cat /sys/module/vhost_net/parameters/experimental_zcopytx
0

6.4. Bridged Networking

Bridged networking (also known as network bridging or virtual network switching) is used to place virtual machine network interfaces on the same network as the physical interface. Bridges require minimal configuration and make a virtual machine appear on an existing network, which reduces management overhead and network complexity. As bridges contain few components and configuration variables, they provide a transparent setup which is straightforward to understand and troubleshoot, if required.
Bridging can be configured in a virtualized environment using standard Red Hat Enterprise Linux tools, virt-manager, or libvirt, and is described in the following sections.
However, even in a virtualized environment, bridges may be more easily created using the host operating system's networking tools. More information about this bridge creation method can be found in the Red Hat Enterprise Linux 7 Networking Guide.

6.4.1. Configuring Bridged Networking on a Red Hat Enterprise Linux 7 Host

Bridged networking can be configured for virtual machines on a Red Hat Enterprise Linux host, independent of the virtualization management tools. This configuration is mainly recommended when the virtualization bridge is the host's only network interface, or is the host's management network interface.
For instructions on configuring network bridging without using virtualization tools, see the Red Hat Enterprise Linux 7 Networking Guide.

6.4.2. Bridged Networking with Virtual Machine Manager

This section provides instructions on creating a bridge from a host machine's interface to a guest virtual machine using virt-manager.

Note

Depending on your environment, setting up a bridge with libvirt tools in Red Hat Enterprise Linux 7 may require disabling Network Manager, which is not recommended by Red Hat. A bridge created with libvirt also requires libvirtd to be running for the bridge to maintain network connectivity.
It is recommended to configure bridged networking on the physical Red Hat Enterprise Linux host as described in the Red Hat Enterprise Linux 7 Networking Guide, while using libvirt after bridge creation to add virtual machine interfaces to the bridges.

Procedure 6.1. Creating a bridge with virt-manager

  1. From the virt-manager main menu, click Edit ⇒ Connection Details to open the Connection Details window.
  2. Click the Network Interfaces tab.
  3. Click the + at the bottom of the window to configure a new network interface.
  4. In the Interface type drop-down menu, select Bridge, and then click Forward to continue.
    Adding a bridge

    Figure 6.1. Adding a bridge

    1. In the Name field, enter a name for the bridge, such as br0.
    2. Select a Start mode from the drop-down menu. Choose from one of the following:
      • none - deactivates the bridge
      • onboot - activates the bridge on the next guest virtual machine reboot
      • hotplug - activates the bridge even if the guest virtual machine is running
    3. Check the Activate now check box to activate the bridge immediately.
    4. To configure either the IP settings or Bridge settings, click the appropriate Configure button. A separate window will open to specify the required settings. Make any necessary changes and click OK when done.
    5. Select the physical interface to connect to your virtual machines. If the interface is currently in use by another guest virtual machine, you will receive a warning message.
  5. Click Finish and the wizard closes, taking you back to the Connections menu.
    Adding a bridge

    Figure 6.2. Adding a bridge

Select the bridge to use, and click Apply to exit the wizard.
To stop the interface, click the Stop Interface key. Once the bridge is stopped, to delete the interface, click the Delete Interface key.

6.4.3. Bridged Networking with libvirt

Depending on your environment, setting up a bridge with libvirt in Red Hat Enterprise Linux 7 may require disabling Network Manager, which is not recommended by Red Hat. This also requires libvirtd to be running for the bridge to operate.
It is recommended to configure bridged networking on the physical Red Hat Enterprise Linux host as described in the Red Hat Enterprise Linux 7 Networking Guide.

Important

libvirt is now able to take advantage of new kernel tunable parameters to manage host bridge forwarding database (FDB) entries, thus potentially improving system network performance when bridging multiple virtual machines. Set the macTableManager attribute of a network's <bridge> element to 'libvirt' in the host's XML configuration file:
<bridge name='br0' macTableManager='libvirt'/>
This will turn off learning (flood) mode on all bridge ports, and libvirt will add or remove entries to the FDB as necessary. Along with removing the overhead of learning the proper forwarding ports for MAC addresses, this also allows the kernel to disable promiscuous mode on the physical device that connects the bridge to the network, which further reduces overhead.

Chapter 7. Overcommitting with KVM

7.1. Introduction

The KVM hypervisor automatically overcommits CPUs and memory. This means that more virtualized CPUs and memory can be allocated to virtual machines than there are physical resources on the system. This is possible because most processes do not access 100% of their allocated resources all the time.
As a result, under-utilized virtualized servers or desktops can run on fewer hosts, which saves a number of system resources, with the net effect of less power, cooling, and investment in server hardware.

7.2. Overcommitting Memory

Guest virtual machines running on a KVM hypervisor do not have dedicated blocks of physical RAM assigned to them. Instead, each guest virtual machine functions as a Linux process where the host physical machine's Linux kernel allocates memory only when requested. In addition the host's memory manager can move the guest virtual machine's memory between its own physical memory and swap space.
Overcommitting requires allotting sufficient swap space on the host physical machine to accommodate all guest virtual machines as well as enough memory for the host physical machine's processes. As a basic rule, the host physical machine's operating system requires a maximum of 4 GB of memory along with a minimum of 4 GB of swap space. For advanced instructions on determining an appropriate size for the swap partition, see the Red Hat Knowledgebase.

Important

Overcommitting is not an ideal solution for general memory issues. The recommended methods to deal with memory shortage are to allocate less memory per guest, add more physical memory to the host, or utilize swap space.
A virtual machine will run slower if it is swapped frequently. In addition, overcommitting can cause the system to run out of memory (OOM), which may lead to the Linux kernel shutting down important system processes. If you decide to overcommit memory, ensure sufficient testing is performed. Contact Red Hat support for assistance with overcommitting.
Overcommitting does not work with all virtual machines, but has been found to work in a desktop virtualization setup with minimal intensive usage or running several identical guests with KSM. For more information on KSM and overcommitting, see the Red Hat Enterprise Linux 7 Virtualization Tuning and Optimization Guide.

Important

Memory overcommit is not supported with device assignment. This is because when device assignment is in use, all virtual machine memory must be statically pre-allocated to enable direct memory access (DMA) with the assigned device.

7.3. Overcommitting Virtualized CPUs

The KVM hypervisor supports overcommitting virtualized CPUs (vCPUs). Virtualized CPUs can be overcommitted as far as load limits of guest virtual machines allow. Use caution when overcommitting vCPUs, as loads near 100% may cause dropped requests or unusable response times.
In Red Hat Enterprise Linux 7, it is possible to overcommit guests with more than one vCPU, known as symmetric multiprocessing (SMP) virtual machines. However, you may experience performance deterioration when running more cores on the virtual machine than are present on your physical CPU.
For example, a virtual machine with four vCPUs should not be run on a host machine with a dual core processor, but on a quad core host. Overcommitting SMP virtual machines beyond the physical number of processing cores causes significant performance degradation, due to programs getting less CPU time than required. In addition, it is not recommended to have more than 10 total allocated vCPUs per physical processor core.
With SMP guests, some processing overhead is inherent. CPU overcommitting can increase the SMP overhead, because using time-slicing to allocate resources to guests can make inter-CPU communication inside a guest slower. This overhead increases with guests that have a larger number of vCPUs, or a larger overcommit ratio.
Virtualized CPUs are overcommitted best when when a single host has multiple guests, and each guest has a small number of vCPUs, compared to the number of host CPUs. KVM should safely support guests with loads under 100% at a ratio of five vCPUs (on 5 virtual machines) to one physical CPU on one single host. The KVM hypervisor will switch between all of the virtual machines, making sure that the load is balanced.
For best performance, Red Hat recommends assigning guests only as many vCPUs as are required to run the programs that are inside each guest.

Important

Applications that use 100% of memory or processing resources may become unstable in overcommitted environments. Do not overcommit memory or CPUs in a production environment without extensive testing, as the CPU overcommit ratio and the amount of SMP are workload-dependent.

Chapter 8. KVM Guest Timing Management

Virtualization involves several challenges for time keeping in guest virtual machines.
  • Interrupts cannot always be delivered simultaneously and instantaneously to all guest virtual machines. This is because interrupts in virtual machines are not true interrupts. Instead, they are injected into the guest virtual machine by the host machine.
  • The host may be running another guest virtual machine, or a different process. Therefore, the precise timing typically required by interrupts may not always be possible.
Guest virtual machines without accurate time keeping may experience issues with network applications and processes, as session validity, migration, and other network activities rely on timestamps to remain correct.
KVM avoids these issues by providing guest virtual machines with a paravirtualized clock (kvm-clock). However, it is still important to test timing before attempting activities that may be affected by time keeping inaccuracies, such as guest migration.

Important

To avoid the problems described above, the Network Time Protocol (NTP) should be configured on the host and the guest virtual machines. On guests using Red Hat Enterprise Linux 6 and earlier, NTP is implemented by the ntpd service. For more information, see the Red Hat Enterprise 6 Deployment Guide.
On systems using Red Hat Enterprise Linux 7, NTP time synchronization service can be provided by ntpd or by the chronyd service. Note that Chrony has some advantages on virtual machines. For more information, see the Configuring NTP Using the chrony Suite and Configuring NTP Using ntpd sections in the Red Hat Enterprise Linux 7 System Administrator's Guide.
The mechanics of guest virtual machine time synchronization

By default, the guest synchronizes its time with the hypervisor as follows:

  • When the guest system boots, the guest reads the time from the emulated Real Time Clock (RTC).
  • When the NTP protocol is initiated, it automatically synchronizes the guest clock. Afterwards, during normal guest operation, NTP performs clock adjustments in the guest.
  • When a guest is resumed after a pause or a restoration process, a command to synchronize the guest clock to a specified value should be issued by the management software (such as virt-manager). This synchronization works only if the QEMU guest agent is installed in the guest and supports the feature. The value to which the guest clock synchronizes is usually the host clock value.

Constant Time Stamp Counter (TSC)

Modern Intel and AMD CPUs provide a constant Time Stamp Counter (TSC). The count frequency of the constant TSC does not vary when the CPU core itself changes frequency, for example to comply with a power-saving policy. A CPU with a constant TSC frequency is necessary in order to use the TSC as a clock source for KVM guests.

Your CPU has a constant Time Stamp Counter if the constant_tsc flag is present. To determine if your CPU has the constant_tsc flag enter the following command:
$ cat /proc/cpuinfo | grep constant_tsc
If any output is given, your CPU has the constant_tsc bit. If no output is given, follow the instructions below.
Configuring Hosts without a Constant Time Stamp Counter

Systems without a constant TSC frequency cannot use the TSC as a clock source for virtual machines, and require additional configuration. Power management features interfere with accurate time keeping and must be disabled for guest virtual machines to accurately keep time with KVM.

Important

These instructions are for AMD revision F CPUs only.
If the CPU lacks the constant_tsc bit, disable all power management features . Each system has several timers it uses to keep time. The TSC is not stable on the host, which is sometimes caused by cpufreq changes, deep C state, or migration to a host with a faster TSC. Deep C sleep states can stop the TSC. To prevent the kernel using deep C states append processor.max_cstate=1 to the kernel boot. To make this change persistent, edit values of the GRUB_CMDLINE_LINUX key in the /etc/default/grubfile. For example. if you want to enable emergency mode for each boot, edit the entry as follows:
GRUB_CMDLINE_LINUX="emergency"
Note that you can specify multiple parameters for the GRUB_CMDLINE_LINUX key, similarly to adding the parameters in the GRUB 2 boot menu.
To disable cpufreq (only necessary on hosts without the constant_tsc), install kernel-tools and enable the cpupower.service (systemctl enable cpupower.service). If you want to disable this service every time the guest virtual machine boots, change the configuration file in /etc/sysconfig/cpupower and change the CPUPOWER_START_OPTS and CPUPOWER_STOP_OPTS. Valid limits can be found in the /sys/devices/system/cpu/cpuid/cpufreq/scaling_available_governors files. For more information on this package or on power management and governors, see the Red Hat Enterprise Linux 7 Power Management Guide.

8.1. Host-wide Time Synchronization

Virtual network devices in KVM guests do not support hardware timestamping, which means it is difficult to synchronize the clocks of guests that use a network protocol like NTP or PTP with better accuracy than tens of microseconds.
When a more accurate synchronization of the guests is required, it is recommended to synchronize the clock of the host using NTP or PTP with hardware timestamping, and to synchronize the guests to the host directly. Red Hat Enterprise Linux 7.5 and later provide a virtual PTP hardware clock (PHC), which enables the guests to synchronize to the host with a sub-microsecond accuracy.

Important

Note that for PHC to work properly, both the host and the guest need be using RHEL 7.5 or later as the operating system (OS).
To enable the PHC device, do the following on the guest OS:
  1. Set the ptp_kvm module to load after reboot.
    # echo ptp_kvm > /etc/modules-load.d/ptp_kvm.conf
  2. Add the /dev/ptp0 clock as a reference to the chrony configuration:
    # echo "refclock PHC /dev/ptp0 poll 2" >> /etc/chrony.conf
  3. Restart the chrony daemon:
    # systemctl restart chronyd
  4. To verify the host-guest time synchronization has been configured correctly, use the chronyc sources command on a guest. The output should look similar to the following:
    # chronyc sources
    210 Number of sources = 1
    MS Name/IP address         Stratum Poll Reach LastRx Last sample
    ===============================================================================
    #* PHC0                          0   2   377     4     -6ns[   -6ns] +/-  726ns
    

8.2. Required Time Management Parameters for Red Hat Enterprise Linux Guests

For certain Red Hat Enterprise Linux guest virtual machines, additional kernel parameters are required for their system time to be synchronised correctly. These parameters can be set by appending them to the end of the /kernel line in the /etc/grub2.cfg file of the guest virtual machine.

Note

Red Hat Enterprise Linux 5.5 and later, Red Hat Enterprise Linux 6.0 and later, and Red Hat Enterprise Linux 7 use kvm-clock as their default clock source. Running kvm-clock avoids the need for additional kernel parameters, and is recommended by Red Hat.
The table below lists versions of Red Hat Enterprise Linux and the parameters required on the specified systems.

Table 8.1. Kernel parameter requirements

Red Hat Enterprise Linux version Additional guest kernel parameters
7.0 and later on AMD64 and Intel 64 systems with kvm-clock Additional parameters are not required
6.1 and later on AMD64 and Intel 64 systems with kvm-clock Additional parameters are not required
6.0 on AMD64 and Intel 64 systems with kvm-clock Additional parameters are not required
6.0 on AMD64 and Intel 64 systems without kvm-clock notsc lpj=n

Note

The lpj parameter requires a numeric value equal to the loops per jiffy value of the specific CPU on which the guest virtual machine runs. If you do not know this value, do not set the lpj parameter.

8.3. Steal Time Accounting

Steal time is the amount of CPU time needed by a guest virtual machine that is not provided by the host. Steal time occurs when the host allocates these resources elsewhere: for example, to another guest.
Steal time is reported in the CPU time fields in /proc/stat. It is automatically reported by utilities such as top and vmstat. It is displayed as "%st", or in the "st" column. Note that it cannot be switched off.
Large amounts of steal time indicate CPU contention, which can reduce guest performance. To relieve CPU contention, increase the guest's CPU priority or CPU quota, or run fewer guests on the host.

Chapter 9. Network Booting with libvirt

Guest virtual machines can be booted with PXE enabled. PXE allows guest virtual machines to boot and load their configuration off the network itself. This section demonstrates some basic configuration steps to configure PXE guests with libvirt.
This section does not cover the creation of boot images or PXE servers. It is used to explain how to configure libvirt, in a private or bridged network, to boot a guest virtual machine with PXE booting enabled.

Warning

These procedures are provided only as an example. Ensure that you have sufficient backups before proceeding.

9.1. Preparing the Boot Server

To perform the steps in this chapter you will need:
  • A PXE Server (DHCP and TFTP) - This can be a libvirt internal server, manually-configured dhcpd and tftpd, dnsmasq, a server configured by Cobbler, or some other server.
  • Boot images - for example, PXELINUX configured manually or by Cobbler.

9.1.1. Setting up a PXE Boot Server on a Private libvirt Network

This example uses the default network. Perform the following steps:

Procedure 9.1. Configuring the PXE boot server

  1. Place the PXE boot images and configuration in /var/lib/tftpboot.
  2. enter the following commands:
    # virsh net-destroy default
    # virsh net-edit default
  3. Edit the <ip> element in the configuration file for the default network to include the appropriate address, network mask, DHCP address range, and boot file, where BOOT_FILENAME represents the file name you are using to boot the guest virtual machine.
    <ip address='192.168.122.1' netmask='255.255.255.0'>
       <tftp root='/var/lib/tftpboot' />
       <dhcp>
          <range start='192.168.122.2' end='192.168.122.254' />
          <bootp file='BOOT_FILENAME' />
       </dhcp>
    </ip>
  4. Run:
    # virsh net-start default
  5. Boot the guest using PXE (refer to Section 9.2, “Booting a Guest Using PXE”).

9.2. Booting a Guest Using PXE

This section demonstrates how to boot a guest virtual machine with PXE.

9.2.1. Using bridged networking

Procedure 9.2. Booting a guest using PXE and bridged networking

  1. Ensure bridging is enabled such that the PXE boot server is available on the network.
  2. Boot a guest virtual machine with PXE booting enabled. You can use the virt-install command to create a new virtual machine with PXE booting enabled, as shown in the following example command:
    virt-install --pxe --network bridge=breth0 --prompt
    Alternatively, ensure that the guest network is configured to use your bridged network, and that the XML guest configuration file has a <boot dev='network'/> element inside the <os> element, as shown in the following example:
    <os>
       <type arch='x86_64' machine='pc-i440fx-rhel7.0.0'>hvm</type>
       <boot dev='network'/>
       <boot dev='hd'/>
    </os>
    <interface type='bridge'>
       <mac address='52:54:00:5a:ad:cb'/>
       <source bridge='breth0'/>
       <target dev='vnet0'/>
       <alias name='net0'/>
       <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>

9.2.2. Using a Private libvirt Network

Procedure 9.3. Using a private libvirt network

  1. Boot a guest virtual machine using libvirt with PXE booting enabled. You can use the virt-install command to create/install a new virtual machine using PXE:
    virt-install --pxe --network network=default --prompt
Alternatively, ensure that the guest network is configured to use your private libvirt network, and that the XML guest configuration file has a <boot dev='network'/> element inside the <os> element, as shown in the following example:
<os>
   <type arch='x86_64' machine='pc-i440fx-rhel7.0.0'>hvm</type>
   <boot dev='network'/>
   <boot dev='hd'/>
</os>
Also ensure that the guest virtual machine is connected to the private network:
<interface type='network'>
   <mac address='52:54:00:66:79:14'/>
   <source network='default'/>
   <target dev='vnet0'/>
   <alias name='net0'/>
   <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</interface>

Chapter 10. Registering the Hypervisor and Virtual Machine

Red Hat Enterprise Linux 6 and 7 require that every guest virtual machine is mapped to a specific hypervisor in order to ensure that every guest is allocated the same level of subscription service. To do this you need to install a subscription agent that automatically detects all guest Virtual Machines (VMs) on each KVM hypervisor that is installed and registered, which in turn will create a mapping file that sits on the host. This mapping file ensures that all guest VMs receive the following benefits:
  • Subscriptions specific to virtual systems are readily available and can be applied to all of the associated guest VMs.
  • All subscription benefits that can be inherited from the hypervisor are readily available and can be applied to all of the associated guest VMs.

Note

The information provided in this chapter is specific to Red Hat Enterprise Linux subscriptions only. If you also have a Red Hat Virtualization subscription, or a Red Hat Satellite subscription, you should also consult the virt-who information provided with those subscriptions. More information on Red Hat Subscription Management can also be found in the Red Hat Subscription Management Guide found on the customer portal.

10.1. Installing virt-who on the Host Physical Machine

  1. Register the KVM hypervisor

    Register the KVM Hypervisor by running the subscription-manager register [options] command in a terminal as the root user on the host physical machine. More options are available using the # subscription-manager register --help menu. In cases where you are using a user name and password, use the credentials that are known to the Subscription Manager application. If this is your very first time subscribing and you do not have a user account, contact customer support. For example to register the VM as 'admin' with 'secret' as a password, you would send the following command:
    [root@rhel-server ~]# subscription-manager register --username=admin --password=secret --auto-attach
  2. Install the virt-who packages

    Install the virt-who packages, by running the following command on the host physical machine:
    # yum install virt-who
  3. Create a virt-who configuration file

    For each hypervisor, add a configuration file in the /etc/virt-who.d/ directory. At a minimum, the file must contain the following snippet:
    [libvirt]
    type=libvirt
    
    For more detailed information on configuring virt-who, see Section 10.1.1, “Configuring virt-who.
  4. Start the virt-who service

    Start the virt-who service by running the following command on the host physical machine:
    # systemctl start virt-who.service
    # systemctl enable virt-who.service
  5. Confirm virt-who service is receiving guest information

    At this point, the virt-who service will start collecting a list of domains from the host. Check the /var/log/rhsm/rhsm.log file on the host physical machine to confirm that the file contains a list of the guest VMs. For example:
    2015-05-28 12:33:31,424 DEBUG: Libvirt domains found: [{'guestId': '58d59128-cfbb-4f2c-93de-230307db2ce0', 'attributes': {'active': 0, 'virtWhoType': 'libvirt', 'hypervisorType': 'QEMU'}, 'state': 5}]
    

Procedure 10.1. Managing the subscription on the customer portal

  1. Subscribing the hypervisor

    As the virtual machines will be receiving the same subscription benefits as the hypervisor, it is important that the hypervisor has a valid subscription and that the subscription is available for the VMs to use.
    1. Log in to the Customer Portal

      Provide your Red Hat account credentials at the Red Hat Customer Portal to log in.
    2. Click the Systems link

      Go to the Systems section of the My Subscriptions interface.
    3. Select the hypervisor

      On the Systems page, there is a table of all subscribed systems. Click the name of the hypervisor (for example localhost.localdomain). In the details page that opens, click Attach a subscription and select all the subscriptions listed. Click Attach Selected. This will attach the host's physical subscription to the hypervisor so that the guests can benefit from the subscription.
  2. Subscribing the guest virtual machines - first time use

    This step is for those who have a new subscription and have never subscribed a guest virtual machine before. If you are adding virtual machines, skip this step. To consume the subscription assigned to the hypervisor profile on the machine running the virt-who service, auto subscribe by running the following command in a terminal on the guest virtual machine.
    [root@virt-who ~]# subscription-manager attach --auto
  3. Subscribing additional guest virtual machines

    If you just subscribed a virtual machine for the first time, skip this step. If you are adding additional virtual machines, note that running this command will not necessarily re-attach the same subscriptions to the guest virtual machine. This is because removing all subscriptions then allowing auto-attach to resolve what is necessary for a given guest virtual machine may result in different subscriptions consumed than before. This may not have any effect on your system, but it is something you should be aware about. If you used a manual attachment procedure to attach the virtual machine, which is not described below, you will need to re-attach those virtual machines manually as the auto-attach will not work. Use the following command to first remove the subscriptions for the old guests, and then use the auto-attach to attach subscriptions to all the guests. Run these commands on the guest virtual machine.
    [root@virt-who ~]# subscription-manager remove --all
    [root@virt-who ~]# subscription-manager attach --auto
  4. Confirm subscriptions are attached

    Confirm that the subscription is attached to the hypervisor by running the following command on the guest virtual machine:
    [root@virt-who ~]# subscription-manager list --consumed
    Output similar to the following will be displayed. Pay attention to the Subscription Details. It should say 'Subscription is current'.
    [root@virt-who ~]# subscription-manager list --consumed
    +-------------------------------------------+
       Consumed Subscriptions
    +-------------------------------------------+
    Subscription Name:	Awesome OS with unlimited virtual guests
    Provides: 		Awesome OS Server Bits
    SKU: 			awesomeos-virt-unlimited
    Contract: 		0
    Account: 		######### Your account number #####
    Serial: 		######### Your serial number ######
    Pool ID: 		XYZ123                                             1
    Provides Management: 	No
    Active: 		True
    Quantity Used: 		1
    Service Level:
    Service Type:
    Status Details:		Subscription is current                       2
    Subscription Type:
    Starts: 		01/01/2015
    Ends: 			12/31/2015
    System Type: 		Virtual
    

    1

    The ID for the subscription to attach to the system is displayed here. You will need this ID if you need to attach the subscription manually.

    2

    Indicates if your subscription is current. If your subscription is not current, an error message appears. One example is Guest has not been reported on any host and is using a temporary unmapped guest subscription. In this case the guest needs to be subscribed. In other cases, use the information as indicated in Section 10.5.2, “I have subscription status errors, what do I do?”.
  5. Register additional guests

    When you install new guest VMs on the hypervisor, you must register the new VM and use the subscription attached to the hypervisor, by running the following commands on the guest virtual machine:
    # subscription-manager register
    # subscription-manager attach --auto
    # subscription-manager list --consumed

10.1.1. Configuring virt-who

The virt-who service is configured using the following files:
  • /etc/virt-who.conf - Contains general configuration information including the interval for checking connected hypervisors for changes.
  • /etc/virt-who.d/hypervisor_name.conf - Contains configuration information for a specific hypervisor.
A web-based wizard is provided to generate hypervisor configuration files and the snippets required for virt-who.conf. To run the wizard, browse to Red Hat Virtualization Agent (virt-who) Configuration Helper on the Customer Portal.
On the second page of the wizard, select the following options:
  • Where does your virt-who report to?: Subscription Asset Manager
  • Hypervisor Type: libvirt
Follow the wizard to complete the configuration. If the configuration is performed correctly, virt-who will automatically provide the selected subscriptions to existing and future guests on the specified hypervisor.
For more information on hypervisor configuration files, see the virt-who-config man page.

10.2. Registering a New Guest Virtual Machine

In cases where a new guest virtual machine is to be created on a host that is already registered and running, the virt-who service must also be running. This ensures that the virt-who service maps the guest to a hypervisor, so the system is properly registered as a virtual system. To register the virtual machine, enter the following command:
[root@virt-server ~]# subscription-manager register --username=admin --password=secret --auto-attach

10.3. Removing a Guest Virtual Machine Entry

If the guest virtual machine is running, unregister the system, by running the following command in a terminal window as root on the guest:
[root@virt-guest ~]# subscription-manager unregister
If the system has been deleted, however, the virtual service cannot tell whether the service is deleted or paused. In that case, you must manually remove the system from the server side, using the following steps:
  1. Login to the Subscription Manager

    The Subscription Manager is located on the Red Hat Customer Portal. Login to the Customer Portal using your user name and password, by clicking the login icon at the top of the screen.
  2. Click the Subscriptions tab

    Click the Subscriptions tab.
  3. Click the Systems link

    Scroll down the page and click the Systems link.
  4. Delete the system

    To delete the system profile, locate the specified system's profile in the table, select the check box beside its name and click Delete.

10.4. Installing virt-who Manually

This section will describe how to manually attach the subscription provided by the hypervisor.

Procedure 10.2. How to attach a subscription manually

  1. List subscription information and find the Pool ID

    First you need to list the available subscriptions which are of the virtual type. Enter the following command:
    [root@server1 ~]# subscription-manager list --avail --match-installed | grep 'Virtual' -B12
    Subscription Name: Red Hat Enterprise Linux ES (Basic for Virtualization)
    Provides:          Red Hat Beta
                       Oracle Java (for RHEL Server)
                       Red Hat Enterprise Linux Server
    SKU:               -------
    Pool ID:           XYZ123
    Available:         40
    Suggested:         1
    Service Level:     Basic
    Service Type:      L1-L3
    Multi-Entitlement: No
    Ends:              01/02/2017
    System Type:       Virtual
    
    Note the Pool ID displayed. Copy this ID as you will need it in the next step.
  2. Attach the subscription with the Pool ID

    Using the Pool ID you copied in the previous step run the attach command. Replace the Pool ID XYZ123 with the Pool ID you retrieved. Enter the following command:
    [root@server1 ~]# subscription-manager attach --pool=XYZ123
    
    Successfully attached a subscription for: Red Hat Enterprise Linux ES (Basic for Virtualization)
    

10.5. Troubleshooting virt-who

10.5.1. Why is the hypervisor status red?

Scenario: On the server side, you deploy a guest on a hypervisor that does not have a subscription. 24 hours later, the hypervisor displays its status as red. To remedy this situation you must get a subscription for that hypervisor. Or, permanently migrate the guest to a hypervisor with a subscription.

10.5.2. I have subscription status errors, what do I do?

Scenario: Any of the following error messages display:
  • System not properly subscribed
  • Status unknown
  • Late binding of a guest to a hypervisor through virt-who (host/guest mapping)
To find the reason for the error open the virt-who log file, named rhsm.log, located in the /var/log/rhsm/ directory.

Chapter 11. Enhancing Virtualization with the QEMU Guest Agent and SPICE Agent

Agents in Red Hat Enterprise Linux such as the QEMU guest agent and the SPICE agent can be deployed to help the virtualization tools run more optimally on your system. These agents are described in this chapter.

Note

To further optimize and tune host and guest performance, see the Red Hat Enterprise Linux 7 Virtualization Tuning and Optimization Guide.

11.1. QEMU Guest Agent

The QEMU guest agent runs inside the guest and allows the host machine to issue commands to the guest operating system using libvirt, helping with functions such as freezing and thawing filesystems. The guest operating system then responds to those commands asynchronously. The QEMU guest agent package, qemu-guest-agent, is installed by default in Red Hat Enterprise Linux 7.
This section covers the libvirt commands and options available to the guest agent.

Important

Note that it is only safe to rely on the QEMU guest agent when run by trusted guests. An untrusted guest may maliciously ignore or abuse the guest agent protocol, and although built-in safeguards exist to prevent a denial of service attack on the host, the host requires guest co-operation for operations to run as expected.
Note that QEMU guest agent can be used to enable and disable virtual CPUs (vCPUs) while the guest is running, thus adjusting the number of vCPUs without using the hot plug and hot unplug features. For more information, see Section 20.36.6, “Configuring Virtual CPU Count”.

11.1.1. Setting up Communication between the QEMU Guest Agent and Host

The host machine communicates with the QEMU guest agent through a VirtIO serial connection between the host and guest machines. A VirtIO serial channel is connected to the host via a character device driver (typically a Unix socket), and the guest listens on this serial channel.

Note

The qemu-guest-agent does not detect if the host is listening to the VirtIO serial channel. However, as the current use for this channel is to listen for host-to-guest events, the probability of a guest virtual machine running into problems by writing to the channel with no listener is very low. Additionally, the qemu-guest-agent protocol includes synchronization markers that allow the host physical machine to force a guest virtual machine back into sync when issuing a command, and libvirt already uses these markers, so that guest virtual machines are able to safely discard any earlier pending undelivered responses.

11.1.1.1. Configuring the QEMU Guest Agent on a Linux Guest

The QEMU guest agent can be configured on a running or shut down virtual machine. If configured on a running guest, the guest will start using the guest agent immediately. If the guest is shut down, the QEMU guest agent will be enabled at next boot.
Either virsh or virt-manager can be used to configure communication between the guest and the QEMU guest agent. The following instructions describe how to configure the QEMU guest agent on a Linux guest.

Procedure 11.1. Setting up communication between guest agent and host with virsh on a shut down Linux guest

  1. Shut down the virtual machine

    Ensure the virtual machine (named rhel7 in this example) is shut down before configuring the QEMU guest agent:
    # virsh shutdown rhel7 
  2. Add the QEMU guest agent channel to the guest XML configuration

    Edit the guest's XML file to add the QEMU guest agent details:
    # virsh edit rhel7
    Add the following to the guest's XML file and save the changes:
    <channel type='unix'>
       <target type='virtio' name='org.qemu.guest_agent.0'/>
    </channel>
  3. Start the virtual machine

    # virsh start rhel7
  4. Install the QEMU guest agent on the guest

    Install the QEMU guest agent if not yet installed in the guest virtual machine:
    # yum install qemu-guest-agent
  5. Start the QEMU guest agent in the guest

    Start the QEMU guest agent service in the guest:
    # systemctl start qemu-guest-agent
Alternatively, the QEMU guest agent can be configured on a running guest with the following steps:

Procedure 11.2. Setting up communication between guest agent and host on a running Linux guest

  1. Create an XML file for the QEMU guest agent

    # cat agent.xml
    <channel type='unix'>
       <target type='virtio' name='org.qemu.guest_agent.0'/>
    </channel>
  2. Attach the QEMU guest agent to the virtual machine

    Attach the QEMU guest agent to the running virtual machine (named rhel7 in this example) with this command:
    # virsh attach-device rhel7 agent.xml
  3. Install the QEMU guest agent on the guest

    Install the QEMU guest agent if not yet installed in the guest virtual machine:
    # yum install qemu-guest-agent
  4. Start the QEMU guest agent in the guest

    Start the QEMU guest agent service in the guest:
    # systemctl start qemu-guest-agent

Procedure 11.3. Setting up communication between the QEMU guest agent and host with virt-manager

  1. Shut down the virtual machine

    Ensure the virtual machine is shut down before configuring the QEMU guest agent.
    To shut down the virtual machine, select it from the list of virtual machines in Virtual Machine Manager, then click the light switch icon from the menu bar.
  2. Add the QEMU guest agent channel to the guest

    Open the virtual machine's hardware details by clicking the lightbulb icon at the top of the guest window.
    Click the Add Hardware button to open the Add New Virtual Hardware window, and select Channel.
    Select the QEMU guest agent from the Name drop-down list and click Finish:
    Selecting the QEMU guest agent channel device

    Figure 11.1. Selecting the QEMU guest agent channel device

  3. Start the virtual machine

    To start the virtual machine, select it from the list of virtual machines in Virtual Machine Manager, then click on the menu bar.
  4. Install the QEMU guest agent on the guest

    Open the guest with virt-manager and install the QEMU guest agent if not yet installed in the guest virtual machine:
    # yum install qemu-guest-agent
  5. Start the QEMU guest agent in the guest

    Start the QEMU guest agent service in the guest:
    # systemctl start qemu-guest-agent
The QEMU guest agent is now configured on the rhel7 virtual machine.

11.2. Using the QEMU Guest Agent with libvirt

Installing the QEMU guest agent allows various libvirt commands to become more powerful. The guest agent enhances the following virsh commands:
  • virsh shutdown --mode=agent - This shutdown method is more reliable than virsh shutdown --mode=acpi, as virsh shutdown used with the QEMU guest agent is guaranteed to shut down a cooperative guest in a clean state. If the agent is not present, libvirt must instead rely on injecting an ACPI shutdown event, but some guests ignore that event and thus will not shut down.
    Can be used with the same syntax for virsh reboot.
  • virsh snapshot-create --quiesce - Allows the guest to flush its I/O into a stable state before the snapshot is created, which allows use of the snapshot without having to perform a fsck or losing partial database transactions. The guest agent allows a high level of disk contents stability by providing guest co-operation.
  • virsh domfsfreeze and virsh domfsthaw - Quiesces the guest filesystem in isolation.
  • virsh domfstrim - Instructs the guest to trim its filesystem.
  • virsh domtime - Queries or sets the guest's clock.
  • virsh setvcpus --guest - Instructs the guest to take CPUs offline.
  • virsh domifaddr --source agent - Queries the guest operating system's IP address via the guest agent.
  • virsh domfsinfo - Shows a list of mounted filesystems within the running guest.
  • virsh set-user-password - Sets the password for a user account in the guest.

11.2.1. Creating a Guest Disk Backup

libvirt can communicate with qemu-guest-agent to ensure that snapshots of guest virtual machine file systems are consistent internally and ready to use as needed. Guest system administrators can write and install application-specific freeze/thaw hook scripts. Before freezing the filesystems, the qemu-guest-agent invokes the main hook script (included in the qemu-guest-agent package). The freezing process temporarily deactivates all guest virtual machine applications.
The snapshot process is comprised of the following steps:
  • File system applications / databases flush working buffers to the virtual disk and stop accepting client connections
  • Applications bring their data files into a consistent state
  • Main hook script returns
  • qemu-guest-agent freezes the filesystems and the management stack takes a snapshot
  • Snapshot is confirmed
  • Filesystem function resumes
Thawing happens in reverse order.
To create a snapshot of the guest's file system, run the virsh snapshot-create --quiesce --disk-only command (alternatively, run virsh snapshot-create-as guest_name --quiesce --disk-only, explained in further detail in Section 20.39.2, “Creating a Snapshot for the Current Guest Virtual Machine”).

Note

An application-specific hook script might need various SELinux permissions in order to run correctly, as is done when the script needs to connect to a socket in order to talk to a database. In general, local SELinux policies should be developed and installed for such purposes. Accessing file system nodes should work out of the box, after issuing the restorecon -FvvR command listed in Table 11.1, “QEMU guest agent package contents” in the table row labeled /etc/qemu-ga/fsfreeze-hook.d/.
The qemu-guest-agent binary RPM includes the following files:

Table 11.1. QEMU guest agent package contents

File name Description
/usr/lib/systemd/system/qemu-guest-agent.service Service control script (start/stop) for the QEMU guest agent.
/etc/sysconfig/qemu-ga Configuration file for the QEMU guest agent, as it is read by the /usr/lib/systemd/system/qemu-guest-agent.service control script. The settings are documented in the file with shell script comments.
/usr/bin/qemu-ga QEMU guest agent binary file.
/etc/qemu-ga Root directory for hook scripts.
/etc/qemu-ga/fsfreeze-hook Main hook script. No modifications are needed here.
/etc/qemu-ga/fsfreeze-hook.d Directory for individual, application-specific hook scripts. The guest system administrator should copy hook scripts manually into this directory, ensure proper file mode bits for them, and then run restorecon -FvvR on this directory.
/usr/share/qemu-kvm/qemu-ga/ Directory with sample scripts (for example purposes only). The scripts contained here are not executed.
The main hook script, /etc/qemu-ga/fsfreeze-hook logs its own messages, as well as the application-specific script's standard output and error messages, in the following log file: /var/log/qemu-ga/fsfreeze-hook.log. For more information, see the libvirt upstream website.

11.3. SPICE Agent

The SPICE agent helps run graphical applications such as virt-manager more smoothly, by helping integrate the guest operating system with the SPICE client.
For example, when resizing a window in virt-manager, the SPICE agent allows for automatic X session resolution adjustment to the client resolution. The SPICE agent also provides support for copy and paste between the host and guest, and prevents mouse cursor lag.
For system-specific information on the SPICE agent's capabilities, see the spice-vdagent package's README file.

11.3.1. Setting up Communication between the SPICE Agent and Host

The SPICE agent can be configured on a running or shut down virtual machine. If configured on a running guest, the guest will start using the guest agent immediately. If the guest is shut down, the SPICE agent will be enabled at next boot.
Either virsh or virt-manager can be used to configure communication between the guest and the SPICE agent. The following instructions describe how to configure the SPICE agent on a Linux guest.

Procedure 11.4. Setting up communication between guest agent and host with virsh on a Linux guest

  1. Shut down the virtual machine

    Ensure the virtual machine (named rhel7 in this example) is shut down before configuring the SPICE agent:
    # virsh shutdown rhel7 
  2. Add the SPICE agent channel to the guest XML configuration

    Edit the guest's XML file to add the SPICE agent details:
    # virsh edit rhel7
    Add the following to the guest's XML file and save the changes:
    <channel type='spicevmc'>
       <target type='virtio' name='com.redhat.spice.0'/>
    </channel>
  3. Start the virtual machine

    # virsh start rhel7
  4. Install the SPICE agent on the guest

    Install the SPICE agent if not yet installed in the guest virtual machine:
    # yum install spice-vdagent
  5. Start the SPICE agent in the guest

    Start the SPICE agent service in the guest:
    # systemctl start spice-vdagent
Alternatively, the SPICE agent can be configured on a running guest with the following steps:

Procedure 11.5. Setting up communication between SPICE agent and host on a running Linux guest

  1. Create an XML file for the SPICE agent

    # cat agent.xml
    <channel type='spicevmc'>
       <target type='virtio' name='com.redhat.spice.0'/>
    </channel>
  2. Attach the SPICE agent to the virtual machine

    Attach the SPICE agent to the running virtual machine (named rhel7 in this example) with this command:
    # virsh attach-device rhel7 agent.xml
  3. Install the SPICE agent on the guest

    Install the SPICE agent if not yet installed in the guest virtual machine:
    # yum install spice-vdagent
  4. Start the SPICE agent in the guest

    Start the SPICE agent service in the guest:
    # systemctl start spice-vdagent

Procedure 11.6. Setting up communication between the SPICE agent and host with virt-manager

  1. Shut down the virtual machine

    Ensure the virtual machine is shut down before configuring the SPICE agent.
    To shut down the virtual machine, select it from the list of virtual machines in Virtual Machine Manager, then click the light switch icon from the menu bar.
  2. Add the SPICE agent channel to the guest

    Open the virtual machine's hardware details by clicking the lightbulb icon at the top of the guest window.
    Click the Add Hardware button to open the Add New Virtual Hardware window, and select Channel.
    Select the SPICE agent from the Name drop-down list, edit the channel address, and click Finish:
    Selecting the SPICE agent channel device

    Figure 11.2. Selecting the SPICE agent channel device

  3. Start the virtual machine

    To start the virtual machine, select it from the list of virtual machines in Virtual Machine Manager, then click on the menu bar.
  4. Install the SPICE agent on the guest

    Open the guest with virt-manager and install the SPICE agent if not yet installed in the guest virtual machine:
    # yum install spice-vdagent
  5. Start the SPICE agent in the guest

    Start the SPICE agent service in the guest:
    # systemctl start spice-vdagent
The SPICE agent is now configured on the rhel7 virtual machine.

Chapter 12. Nested Virtualization

12.1. Overview

As of Red Hat Enterprise Linux 7.5, nested virtualization is available as a Technology Preview for KVM guest virtual machines. With this feature, a guest virtual machine (also referred to as level 1 or L1) that runs on a physical host (level 0 or L0) can act as a hypervisor, and create its own guest virtual machines (L2).
Nested virtualization is useful in a variety of scenarios, such as debugging hypervisors in a constrained environment and testing larger virtual deployments on a limited amount of physical resources. However, note that nested virtualization is not supported or recommended in production user environments, and is primarily intended for development and testing.
Nested virtualization relies on host virtualization extensions to function, and it should not be confused with running guests in a virtual environment using the QEMU Tiny Code Generator (TCG) emulation, which is not supported in Red Hat Enterprise Linux.

12.2. Setup

Follow these steps to enable, configure, and start using nested virtualization:
  1. Enable: The feature is disabled by default. To enable it, use the following procedure on the L0 host physical machine.
    For Intel:
    1. Check whether nested virtualization is available on your host system.
      $ cat /sys/module/kvm_intel/parameters/nested
      If this command returns Y or 1, the feature is enabled.
      If the command returns 0 or N, use steps ii and iii.
    2. Unload the kvm_intel module:
      # modprobe -r kvm_intel
    3. Activate the nesting feature:
      # modprobe kvm_intel nested=1
    4. The nesting feature is now enabled only until the next reboot of the L0 host. To enable it permanently, add the following line to the /etc/modprobe.d/kvm.conf file:
      options kvm_intel nested=1
    For AMD:
    1. Check whether nested virtualization is available on your system:
      $ cat /sys/module/kvm_amd/parameters/nested
      If this command returns Y or 1, the feature is enabled.
      If the command returns 0 or N, use steps ii and iii.
    2. Unload the kvm_amd module
      # modprobe -r kvm_amd
    3. Activate the nesting feature
      # modprobe kvm_amd nested=1
    4. The nesting feature is now enabled only until the next reboot of the L0 host. To enable it permanently, add the following line to the /etc/modprobe.d/kvm.conf file:
      options kvm_amd nested=1
  2. Configure your L1 virtual machine for nested virtualization using one of the following methods:
    virt-manager
    1. Open the GUI of the intended guest and click the Show Virtual Hardware Details icon.
    2. Select the Processor menu, and in the Configuration section, type host-passthrough in the Model field (do not use the drop-down selection), and click Apply.
    Domain XML
    Add the following line to the domain XML file of the guest:
    <cpu mode='host-passthrough'/>
    If the guest's XML configuration file already contains a <cpu> element, rewrite it.
  3. To start using nested virtualization, install an L2 guest within the L1 guest. To do this, follow the same procedure as when installing the L1 guest - see Chapter 3, Creating a Virtual Machine for more information.

12.3. Restrictions and Limitations

  • It is strongly recommended to run Red Hat Enterprise Linux 7.2 or later in the L0 host and the L1 guests. L2 guests can contain any guest system supported by Red Hat.
  • It is not supported to migrate L1 or L2 guests.
  • Use of L2 guests as hypervisors and creating L3 guests is not supported.
  • Not all features available on the host are available to be utilized by the L1 hypervisor. For instance, IOMMU/VT-d or APICv cannot be used by the L1 hypervisor.
  • To use nested virtualization, the host CPU must have the necessary feature flags. To determine if the L0 and L1 hypervisors are set up correctly, use the cat /proc/cpuinfo command on both L0 and L1, and make sure that the following flags are listed for the respective CPUs on both hypervisors:
    • For Intel - vmx (Hardware Virtualization) and ept (Extended Page Tables)
    • For AMD - svm (equivalent to vmx) and npt (equivalent to ept)

Part II. Administration

This part covers topics related to virtual machines administration and explains how virtualization features, such as virtual networking, storage, and PCI assignment work. This part also provides instruction on device and guest virtual machine management with the use of qemu-img, virt-manager, and virsh tools.

Chapter 13. Managing Storage for Virtual Machines

This chapter provides information about storage for virtual machines. Virtual storage is abstracted from the physical storage allocated to a virtual machine connection. The storage is attached to the virtual machine using paravirtualized or emulated block device drivers.

13.1. Storage Concepts

A storage pool is a quantity of storage set aside for use by guest virtual machines. Storage pools are divided into storage volumes. Each storage volume is assigned to a guest virtual machine as a block device on a guest bus.
Storage pools and volumes are managed using libvirt. With libvirt's remote protocol, it is possible to manage all aspects of a guest virtual machine's life cycle, as well as the configuration of the resources required by the guest virtual machine. These operations can be performed on a remote host. As a result, a management application, such as the Virtual Machine Manager, using libvirt can enable a user to perform all the required tasks for configuring the host physical machine for a guest virtual machine. These include allocating resources, running the guest virtual machine, shutting it down, and de-allocating the resources, without requiring shell access or any other control channel.
The libvirt API can be used to query the list of volumes in the storage pool or to get information regarding the capacity, allocation, and available storage in the storage pool. A storage volume in the storage pool may be queried to get information such as allocation and capacity, which may differ for sparse volumes.

Note

For more information about sparse volumes, see the Virtualization Getting Started Guide.
For storage pools that support it, the libvirt API can be used to create, clone, resize, and delete storage volumes. The APIs can also be used to upload data to storage volumes, download data from storage volumes, or wipe data from storage volumes.
Once a storage pool is started, a storage volume can be assigned to a guest using the storage pool name and storage volume name instead of the host path to the volume in the domain XML.

Note

For more information about the domain XML, see Chapter 23, Manipulating the Domain XML.
Storage pools can be stopped (destroyed). This removes the abstraction of the data, but keeps the data intact.
For example, an NFS server that uses mount -t nfs nfs.example.com:/path/to/share /path/to/data. A storage administrator responsible could define an NFS Storage Pool on the virtualization host to describe the exported server path and the client target path. This will allow libvirt to perform the mount either automatically when libvirt is started or as needed while libvirt is running. Files with the NFS Server exported directory are listed as storage volumes within the NFS storage pool.
When the storage volume is added to the guest, the administrator does not need to add the target path to the volume. He just needs to add the storage pool and storage volume by name. Therefore, if the target client path changes, it does not affect the virtual machine.
When the storage pool is started, libvirt mounts the share on the specified directory, just as if the system administrator logged in and executed mount nfs.example.com:/path/to/share /vmdata. If the storage pool is configured to autostart, libvirt ensures that the NFS shared disk is mounted on the directory specified when libvirt is started.
Once the storage pool is started, the files in the NFS shared disk are reported as storage volumes, and the storage volumes' paths may be queried using the libvirt API. The storage volumes' paths can then be copied into the section of a guest virtual machine's XML definition that describes the source storage for the guest virtual machine's block devices. In the case of NFS, an application that uses the libvirt API can create and delete storage volumes in the storage pool (files in the NFS share) up to the limit of the size of the pool (the storage capacity of the share).
Not all storage pool types support creating and deleting volumes. Stopping the storage pool (pool-destroy) undoes the start operation, in this case, unmounting the NFS share. The data on the share is not modified by the destroy operation, despite what the name of the command suggests. For more details, see man virsh.

Procedure 13.1. Creating and Assigning Storage

This procedure provides a high-level understanding of the steps needed to create and assign storage for virtual machine guests.
  1. Create storage pools

    Create one or more storage pools from available storage media. For more information, see Section 13.2, “Using Storage Pools”.
  2. Create storage volumes

    Create one or more storage volumes from the available storage pools. For more information, see Section 13.3, “Using Storage Volumes”.
  3. Assign storage devices to a virtual machine.

    Assign one or more storage devices abstracted from storage volumes to a guest virtual machine. For more information, see Section 13.3.6, “Adding Storage Devices to Guests”.

13.2. Using Storage Pools

This section provides information about using storage pools with virtual machines. It provides conceptual information, as well as detailed instructions on creating, configuring, and deleting storage pools using virsh commands and the Virtual Machine Manager.

13.2.1. Storage Pool Concepts

A storage pool is a file, directory, or storage device, managed by libvirt to provide storage to virtual machines. Storage pools are divided into storage volumes that store virtual machine images or are attached to virtual machines as additional storage. Multiple guests can share the same storage pool, allowing for better allocation of storage resources.
Storage pools can be either local or network-based (shared):
Local storage pools
Local storage pools are attached directly to the host server. They include local directories, directly attached disks, physical partitions, and Logical Volume Management (LVM) volume groups on local devices. Local storage pools are useful for development, testing, and small deployments that do not require migration or large numbers of virtual machines. Local storage pools may not be suitable for many production environments, because they cannot be used for live migration.
Networked (shared) storage pools
Networked storage pools include storage devices shared over a network using standard protocols. Networked storage is required when migrating virtual machines between hosts with virt-manager, but is optional when migrating with virsh.
For more information on migrating virtual machines, see Chapter 15, KVM Migration.
The following is a list of storage pool types supported by Red Hat Enterprise Linux:
  • Directory-based storage pools
  • Disk-based storage pools
  • Partition-based storage pools
  • GlusterFS storage pools
  • iSCSI-based storage pools
  • LVM-based storage pools
  • NFS-based storage pools
  • vHBA-based storage pools with SCSI devices
The following is a list of libvirt storage pool types that are not supported by Red Hat Enterprise Linux:
  • Multipath-based storage pool
  • RBD-based storage pool
  • Sheepdog-based storage pool
  • Vstorage-based storage pool
  • ZFS-based storage pool

Note

Some of the unsupported storage pool types appear in the Virtual Machine Manager interface. However, they should not be used.

13.2.2. Creating Storage Pools

This section provides general instructions for creating storage pools using virsh and the Virtual Machine Manager. Using virsh enables you to specify all parameters, whereas using Virtual Machine Manager provides a graphic method for creating simpler storage pools.

13.2.2.1. Creating Storage Pools with virsh

Note

This section shows the creation of a partition-based storage pool as an example.

Procedure 13.2. Creating Storage Pools with virsh

  1. Read recommendations and ensure that all prerequisites are met

    For some storage pools, this guide recommends that you follow certain practices. In addition, there are prerequisites for some types of storage pools. To see the recommendations and prerequisites, if any, see Section 13.2.3, “Storage Pool Specifics”.
  2. Define the storage pool

    Storage pools can be persistent or transient. A persistent storage pool survives a system restart of the host machine. A transient storage pool only exists until the host reboots.
    Do one of the following:
    • Define the storage pool using an XML file.
      a. Create a temporary XML file containing the storage pool information required for the new device.
      The XML file must contain specific fields, based on the storage pool type. For more information, see Section 13.2.3, “Storage Pool Specifics”.
      The following shows an example a storage pool definition XML file. In this example, the file is saved to ~/guest_images.xml
      <pool type='fs'>
        <name>guest_images_fs</name>
        <source>
          <device path='/dev/sdc1'/>
        </source>
        <target>
          <path>/guest_images</path>
        </target>
      </pool> 
      b. Use the virsh pool-define command to create a persistent storage pool or the virsh pool-create command to create and start a transient storage pool.
      # virsh pool-define ~/guest_images.xml
      Pool defined from guest_images_fs
      
      or
      # virsh pool-create ~/guest_images.xml
      Pool created from guest_images_fs
      c. Delete the XML file created in step a.
    • Use the virsh pool-define-as command to create a persistent storage pool or the virsh pool-create-as command to create a transient storage pool.
      The following examples create a persistent and then a transient filesystem-based storage pool mapped to /dev/sdc1 from the /guest_images directory.
      # virsh pool-define-as guest_images_fs fs - - /dev/sdc1 - "/guest_images"
      Pool guest_images_fs defined
      or
      # virsh pool-create-as guest_images_fs fs - - /dev/sdc1 - "/guest_images"
      Pool guest_images_fs created

      Note

      When using the virsh interface, option names in the commands are optional. If option names are not used, use dashes for fields that do not need to be specified.
  3. Verify that the pool was created

    List all existing storage pools using the virsh pool-list --all.
    # virsh pool-list --all
    Name                 State      Autostart
    -----------------------------------------
    default              active     yes
    guest_images_fs      inactive   no
    
  4. Define the storage pool target path

    Use the virsh pool-build command to create a storage pool target path for a pre-formatted file system storage pool, initialize the storage source device, and define the format of the data. Then use the virsh pool-list command to ensure that the storage pool is listed.
    # virsh pool-build guest_images_fs
    Pool guest_images_fs built
    # ls -la /guest_images
    total 8
    drwx------.  2 root root 4096 May 31 19:38 .
    dr-xr-xr-x. 25 root root 4096 May 31 19:38 ..
    # virsh pool-list --all
    Name                 State      Autostart
    -----------------------------------------
    default              active     yes
    guest_images_fs      inactive   no
    

    Note

    Building the target path is only necessary for disk-based, file system-based, and logical storage pools. If libvirt detects that the source storage device's data format differs from the selected storage pool type, the build fails, unless the overwrite option is specified.
  5. Start the storage pool

    Use the virsh pool-start command to prepare the source device for usage.
    The action taken depends on the storage pool type. For example, for a file system-based storage pool, the virsh pool-start command mounts the file system. For an LVM-based storage pool, the virsh pool-start command activates the volume group usng the vgchange command.
    Then use the virsh pool-list command to ensure that the storage pool is active.
    # virsh pool-start guest_images_fs
    Pool guest_images_fs started
    # virsh pool-list --all
    Name                 State      Autostart
    -----------------------------------------
    default              active     yes
    guest_images_fs      active     no
    

    Note

    The virsh pool-start command is only necessary for persistent storage pools. Transient storage pools are automatically started when they are created.
  6. Turn on autostart (optional)

    By default, a storage pool defined with virsh is not set to automatically start each time libvirtd starts. You can configure the storage pool to start automatically using the virsh pool-autostart command.
    # virsh pool-autostart guest_images_fs
    Pool guest_images_fs marked as autostarted
    
    # virsh pool-list --all
    Name                 State      Autostart
    -----------------------------------------
    default              active     yes
    guest_images_fs      active     yes
    
    The storage pool is now automatically started each time libvirtd starts.
  7. Verify the storage pool

    Verify that the storage pool was created correctly, the sizes reported are as expected, and the state is reported as running. Verify there is a "lost+found" directory in the target path on the file system, indicating that the device is mounted.
    # virsh pool-info guest_images_fs
    Name:           guest_images_fs
    UUID:           c7466869-e82a-a66c-2187-dc9d6f0877d0
    State:          running
    Persistent:     yes
    Autostart:      yes
    Capacity:       458.39 GB
    Allocation:     197.91 MB
    Available:      458.20 GB
    # mount | grep /guest_images
    /dev/sdc1 on /guest_images type ext4 (rw)
    # ls -la /guest_images
    total 24
    drwxr-xr-x.  3 root root  4096 May 31 19:47 .
    dr-xr-xr-x. 25 root root  4096 May 31 19:38 ..
    drwx------.  2 root root 16384 May 31 14:18 lost+found
    

13.2.2.2. Creating storage pools with Virtual Machine Manager

Note

This section shows the creation of a disk-based storage pool as an example.

Procedure 13.3. Creating Storage Pools with Virtual Machine Manager

  1. Prepare the medium on which the storage pool will be created

    This will differ for different types of storage pools. For details, see Section 13.2.3, “Storage Pool Specifics”.
    In this example, you may need to relabel the disk with a GUID Partition Table.
  2. Open the storage settings

    1. In Virtual Machine Manager, select the host connection you want to configure.
      Open the Edit menu and select Connection Details.
    2. Click the Storage tab in the Connection Details window.
      Storage tab

      Figure 13.1. Storage tab

  3. Create a new storage pool

    Note

    Using Virtual Machine Manager, you can only create persistent storage pools. Transient storage pools can only be created using virsh.
    1. Add a new storage pool (part 1)

      Click the button at the bottom of the window. The Add a New Storage Pool wizard appears.
      Enter a Name for the storage pool. This example uses the name guest_images_fs.
      Select a storage pool type to create from the Type drop-down list. This example uses fs: Pre-Formatted Block Device.
      Storage pool name and type

      Figure 13.2. Storage pool name and type

      Click the Forward button to continue.
    2. Add a new pool (part 2)

      Storage pool path

      Figure 13.3. Storage pool path

      Configure the storage pool with the relevant parameters. For information on the parameters for each type of storage pool, see Section 13.2.3, “Storage Pool Specifics”.
      For some types of storage pools, a Build Pool check box appears in the dialog. If you want to build the storage pool from the storage, check the Build Pool check box.
      Verify the details and click the Finish button to create the storage pool.

13.2.3. Storage Pool Specifics

This section provides information specific to each type of storage pool, including prerequisites, parameters, and additional information. It includes the following topics:

13.2.3.1. Directory-based storage pools

Parameters
The following table provides a list of required parameters for the XML file, the virsh pool-define-as command, and the Virtual Machine Manager application, for creating a directory-based storage pool.

Table 13.1. Directory-based storage pool parameters

DescriptionXMLpool-define-asVirtual Machine Manager
The type of storage pool <pool type='dir'> [type] directory dir: Filesystem Directory
The name of the storage pool <name>name</name> [name] name Name
The path specifying the target. This will be the path used for the storage pool.

<target>
  <path>target_path</path>
</target>

target path_to_pool Target Path
If you are using virsh to create the storage pool, continue by verifying that the pool was created.
Examples
The following is an example of an XML file for a storage pool based on the /guest_images directory:
<pool type='dir'>
  <name>dirpool</name>
  <target>
    <path>/guest_images</path>
  </target>
</pool>  
The following is an example of a command for creating a storage pool based on the /guest_images directory:
# virsh pool-define-as dirpool dir --target "/guest_images"
Pool FS_directory defined
The following images show an example of the Virtual Machine Manager Add a New Storage Pool dialog boxes for creating a storage pool based on the /guest_images directory:
Add a new directory-based storage pool example

Figure 13.4. Add a new directory-based storage pool example

13.2.3.2. Disk-based storage pools

Recommendations
Be aware of the following before creating a disk-based storage pool:
  • Depending on the version of libvirt being used, dedicating a disk to a storage pool may reformat and erase all data currently stored on the disk device. It is strongly recommended that you back up the data on the storage device before creating a storage pool.
  • Guests should not be given write access to whole disks or block devices (for example, /dev/sdb). Use partitions (for example, /dev/sdb1) or LVM volumes.
    If you pass an entire block device to the guest, the guest will likely partition it or create its own LVM groups on it. This can cause the host physical machine to detect these partitions or LVM groups and cause errors.
Prerequisites

Note

The steps in this section are only required if you do not run the virsh pool-build command.
Before a disk-based storage pool can be created on a host disk, the disk must be relabeled with a GUID Partition Table (GPT) disk label. GPT disk labels allow for creating up to 128 partitions on each device.
# parted /dev/sdb
GNU Parted 2.1
Using /dev/sdb
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) mklabel
New disk label type? gpt
(parted) quit
Information: You may need to update /etc/fstab.
#
After relabeling the disk, continue creating the storage pool with defining the storage pool.
Parameters
The following table provides a list of required parameters for the XML file, the virsh pool-define-as command, and the Virtual Machine Manager application, for creating a disk-based storage pool.

Table 13.2. Disk-based storage pool parameters

DescriptionXMLpool-define-asVirtual Machine Manager
The type of storage pool <pool type='disk'> [type] disk disk: Physical Disk Device
The name of the storage pool <name>name</name> [name] name Name
The path specifying the storage device. For example, /dev/sdb

<source>
  <device path=/dev/sdb/>
<source>

source-dev path_to_disk Source Path
The path specifying the target. This will be the path used for the storage pool.

<target>
  <path>/path_to_pool</path>
</target>

target path_to_pool Target Path
If you are using virsh to create the storage pool, continue with defining the storage pool.
Examples
The following is an example of an XML file for a disk-based storage pool:
<pool type='disk'>
  <name>phy_disk</name>
  <source>
    <device path='/dev/sdb'/>
    <format type='gpt'/>
  </source>
  <target>
    <path>/dev</path>
  </target>
</pool>  
The following is an example of a command for creating a disk-based storage pool:
# virsh pool-define-as phy_disk disk --source-format=gpt --source-dev=/dev/sdb --target /dev
Pool phy_disk defined
The following images show an example of the virtual machine XML configurationVirtual Machine Manager Add a New Storage Pool dialog boxes for creating a disk-based storage pool:
Add a new disk-based storage pool example

Figure 13.5. Add a new disk-based storage pool example

13.2.3.3. Filesystem-based storage pools

Recommendations
Do not use the procedures in this section to assign an entire disk as a storage pool (for example, /dev/sdb). Guests should not be given write access to whole disks or block devices. This method should only be used to assign partitions (for example, /dev/sdb1) to storage pools.
Prerequisites

Note

This is only required if you do not run the virsh pool-build command.
To create a storage pool from a partition, format the file system to ext4.
# mkfs.ext4 /dev/sdc1
After formatting the file system, continue creating the storage pool with defining the storage pool.
Parameters
The following table provides a list of required parameters for the XML file, the virsh pool-define-as command, and the Virtual Machine Manager application, for creating a filesystem-based storage pool from a partition.

Table 13.3. Filesystem-based storage pool parameters

DescriptionXMLpool-define-asVirtual Machine Manager
The type of storage pool <pool type='fs'> [type] fs fs: Pre-Formatted Block Device
The name of the storage pool <name>name</name> [name] name Name
The path specifying the partition. For example, /dev/sdc1

<source>
  <device path='source_path' />

[source] path_to_partition Source Path
The filesystem type, for example ext4

  <format type='fs_type' />
</source>

[source format] FS-format N/A
The path specifying the target. This will be the path used for the storage pool.

<target>
  <path>/path_to_pool</path>
</target>

[target] path_to_pool Target Path
If you are using virsh to create the storage pool, continue with verifying that the storage pool was created.
Examples
The following is an example of an XML file for a filesystem-based storage pool:
<pool type='fs'>
  <name>guest_images_fs</name>
  <source>
    <device path='/dev/sdc1'/>
    <format type='auto'/>
  </source>
  <target>
    <path>/guest_images</path>
  </target>
</pool>  
The following is an example of a command for creating a partition-based storage pool:
# virsh pool-define-as guest_images_fs fs --source-dev /dev/sdc1 --target /guest_images
Pool guest_images_fs defined
The following images show an example of the virtual machine XML configurationVirtual Machine Manager Add a New Storage Pool dialog boxes for creating a filesystem-based storage pool:
Add a new filesystem-based storage pool example

Figure 13.6. Add a new filesystem-based storage pool example

13.2.3.4. GlusterFS-based storage pools

Recommendations
GlusterFS is a user space file system that uses File System in User Space (FUSE).
Prerequisites
Before a GlusterFS-based storage pool can be created on a host, a Gluster server must be prepared.

Procedure 13.4. Preparing a Gluster server

  1. Obtain the IP address of the Gluster server by listing its status with the following command:
    # gluster volume status
    Status of volume: gluster-vol1
    Gluster process						Port	Online	Pid
    ------------------------------------------------------------------------------
    Brick 222.111.222.111:/gluster-vol1 			49155	Y	18634
    
    Task Status of Volume gluster-vol1
    ------------------------------------------------------------------------------
    There are no active volume tasks
    
  2. If not installed, install the glusterfs-fuse package.
  3. If not enabled, enable the virt_use_fusefs boolean. Check that it is enabled.
    # setsebool virt_use_fusefs on
    # getsebool virt_use_fusefs
    virt_use_fusefs --> on
    
After ensuring that the required packages are installed and enabled, continue creating the storage pool continue creating the storage pool with defining the storage pool.
Parameters
The following table provides a list of required parameters for the XML file, the virsh pool-define-as command, and the Virtual Machine Manager application, for creating a GlusterFS-based storage pool.

Table 13.4. GlusterFS-based storage pool parameters

DescriptionXMLpool-define-asVirtual Machine Manager
The type of storage pool <pool type='gluster'> [type] gluster Gluster: Gluster Filesystem
The name of the storage pool <name>name</name> [name] name Name
The hostname or IP address of the Gluster server

<source>
  <hostname='hostname' />

source-host hostname Host Name
The name of the Gluster server   <name='Gluster-name' /> source-name Gluster-name Source Name
The path on the Gluster server used for the storage pool

  <dir path='Gluster-path' />
</source>

source-path Gluster-path Source Path
If you are using virsh to create the storage pool, continue with verifying that the storage pool was created.
Examples
The following is an example of an XML file for a GlusterFS-based storage pool:
<pool type='gluster'>
  <name>Gluster_pool</name>
  <source>
    <host name='111.222.111.222'/>
    <dir path='/'/>
    <name>gluster-vol1</name>
  </source>
</pool>  
The following is an example of a command for creating a GlusterFS-based storage pool:
# pool-define-as --name Gluster_pool --type gluster --source-host 111.222.111.222 --source-name gluster-vol1 --source-path /
Pool Gluster_pool defined
The following images show an example of the virtual machine XML configurationVirtual Machine Manager Add a New Storage Pool dialog boxes for creating a GlusterFS-based storage pool:
Add a new GlusterFS-based storage pool example

Figure 13.7. Add a new GlusterFS-based storage pool example

13.2.3.5. iSCSI-based storage pools

Recommendations
Internet Small Computer System Interface (iSCSI) is a network protocol for sharing storage devices. iSCSI connects initiators (storage clients) to targets (storage servers) using SCSI instructions over the IP layer.
Using iSCSI-based devices to store guest virtual machines allows for more flexible storage options, such as using iSCSI as a block storage device. The iSCSI devices use a Linux-IO (LIO) target. This is a multi-protocol SCSI target for Linux. In addition to iSCSI, LIO also supports Fibre Channel and Fibre Channel over Ethernet (FCoE).
Prerequisites
Before an iSCSI-based storage pool can be created, iSCSI targets must be created. iSCSI targets are created with the targetcli package, which provides a command set for creating software-backed iSCSI targets.

Procedure 13.5. Creating an iSCSI target

  1. Install the targetcli package

    # yum install targetcli
  2. Launch the targetcli command set

    # targetcli
  3. Create storage objects

    Create three storage objects, using a storage pool.
    1. Create a block storage object
      1. Navigate to the /backstores/block directory.
      2. Run the create command.
        # create [block-name][filepath]
        For example:
        # create block1 dev=/dev/sdb1
    2. Create a fileio object
      1. Navigate to the /fileio directory.
      2. Run the create command.
        # create [fileio-name][image-name] [image-size]
        For example:
        # create fileio1 /foo.img 50M
    3. Create a ramdisk object
      1. Navigate to the /ramdisk directory.
      2. Run the create command.
        # create [ramdisk-name] [ramdisk-size]
        For example:
        # create ramdisk1 1M
    4. Make note of the names of the disks created in this step. They will be used later.
  4. Create an iSCSI target

    1. Navigate to the /iscsi directory.
    2. Create the target in one of two ways:
      • Run the create command with no parameters.
        The iSCSI qualified name (IQN) is generated automatically.
      • Run the create command specifying the IQN and the server. For example:
        # create iqn.2010-05.com.example.server1:iscsirhel7guest
  5. Define the portal IP address

    To export the block storage over iSCSI, the portal, LUNs, and access control lists ACLs must first be configured.
    The portal includes the IP address and TCP that the target monitors, and the initiators to which it connects. iSCSI uses port 3260. This port is configured by default.
    To connect to port 3260:
    1. Navigate to the /tpg directory.
    2. Run the following:
      # portals/ create
      This command makes all available IP addresses listening to port 3260.
      If you want only a single IP address to listen to port 3260, add the IP address to the end of the command. For example:
      # portals/ create 143.22.16.33
  6. Configure the LUNs and assign storage objects to the fabric

    This step uses the storage objects created in creating storage objects.
    1. Navigate to the luns directory for the TPG created in defining the portal IP address. For example:
      # iscsi>iqn.iqn.2010-05.com.example.server1:iscsirhel7guest
    2. Assign the first LUN to the ramdisk. For example:
      # create /backstores/ramdisk/ramdisk1
    3. Assign the second LUN to the block disk. For example:
      # create /backstores/block/block1
    4. Assign the third LUN to the fileio disk. For example:
      # create /backstores/fileio/fileio1
    5. List the resulting LUNs.
      /iscsi/iqn.20...csirhel7guest ls
      
      o- tgp1 ............................................................[enabled, auth]
        o- acls...................................................................[0 ACL]
        o- luns..................................................................[3 LUNs]
        | o- lun0......................................................[ramdisk/ramdisk1]
        | o- lun1...............................................[block/block1 (dev/vdb1)]
        | o- lun2................................................[fileio/file1 (foo.img)]
        o- portals.............................................................[1 Portal]
          o- IP-ADDRESS:3260.........................................................[OK]
      
  7. Create ACLs for each initiator

    Enable authentication when the initiator connects. You can also resrict specified LUNs to specified intiators. Targets and initiators have unique names. iSCSI initiators use IQNs.
    1. Find the IQN of the iSCSI initiator, using the initiator name. For example:
      # cat /etc/iscsi/initiator2.iscsi
      InitiatorName=create iqn.2010-05.com.example.server1:iscsirhel7guest
      This IQN is used to create the ACLs.
    2. Navigate to the acls directory.
    3. Create ACLs by doing one of the following:
      • Create ACLS for all LUNs and initiators by running the create command with no parameters.
        # create
      • Create an ACL for a specific LUN and initiator, run the create command specifying the IQN of the iSCSI intiator. For example:
        # create iqn.2010-05.com.example.server1:888
      • Configure the kernel target to use a single user ID and password for all initiators.
        # set auth userid=user_ID
        # set auth password=password
        # set attribute authentication=1
        # set attribute generate_node_acls=1
    After completing this procedure, continue by securing the storage pool.
  8. Save the configuration

    Make the configuration persistent by overwriting the previous boot settings.
    # saveconfig
  9. Enable the service

    To apply the saved settings on the next boot, enable the service.
    # systemctl enable target.service
Optional procedures
There are a number of optional procedures that you can perform with the iSCSI targets before creating the iSCSI-based storage pool.

Procedure 13.6. Configuring a logical volume on a RAID array

  1. Create a RAID5 array

    For information on creating a RAID5 array, see the Red Hat Enterprise Linux 7 Storage Administration Guide.
  2. Create an LVM logical volume on the RAID5 array

    For information on creating an LVM logical volume on a RAID5 array, see the Red Hat Enterprise Linux 7 Logical Volume Manager Administration Guide.

Procedure 13.7. Testing discoverability

  • Ensure that the new iSCSI device is discoverable.

    # iscsiadm --mode discovery --type sendtargets --portal server1.example.com
    143.22.16.33:3260,1 iqn.2010-05.com.example.server1:iscsirhel7guest

Procedure 13.8. Testing device attachment

  1. Attach the new iSCSI device

    Attach the new device (iqn.2010-05.com.example.server1:iscsirhel7guest) to determine whether the device can be attached.
    # iscsiadm -d2 -m node --login
    scsiadm: Max file limits 1024 1024
    
    Logging in to [iface: default, target: iqn.2010-05.com.example.server1:iscsirhel7guest, portal: 143.22.16.33,3260]
    Login to [iface: default, target: iqn.2010-05.com.example.server1:iscsirhel7guest, portal: 143.22.16.33,3260] successful.
    
  2. Detach the device

    # iscsiadm -d2 -m node --logout
    scsiadm: Max file limits 1024 1024
    
    Logging out of session [sid: 2, target: iqn.2010-05.com.example.server1:iscsirhel7guest, portal: 143.22.16.33,3260
    Logout of [sid: 2, target: iqn.2010-05.com.example.server1:iscsirhel7guest, portal: 143.22.16.33,3260] successful.

Procedure 13.9. Using libvirt secrets for an iSCSI storage pool

Note

This procedure is required if a user_ID and password were defined when creating an iSCSI target.
User name and password parameters can be configured with virsh to secure an iSCSI storage pool. This can be configured before or after the pool is defined, but the pool must be started for the authentication settings to take effect.
  1. Create a libvirt secret file

    Create a libvirt secret file with a challenge-handshake authentication protocol (CHAP) user name. For example:
    <secret ephemeral='no' private='yes'>
        <description>Passphrase for the iSCSI example.com server</description>
        <usage type='iscsi'>
            <target>iscsirhel7secret</target>
        </usage>
    </secret>    
  2. Define the secret

    # virsh secret-define secret.xml
  3. Verify the UUID

    # virsh secret-list
    UUID                                  Usage
    --------------------------------------------------------------------------------
    2d7891af-20be-4e5e-af83-190e8a922360  iscsi iscsirhel7secret
  4. Assign a secret to the UID

    Use the following commands to assign a secret to the UUID in the output of the previous step. This ensures that the CHAP username and password are in a libvirt-controlled secret list.
    # MYSECRET=`printf %s "password123" | base64`
    # virsh secret-set-value 2d7891af-20be-4e5e-af83-190e8a922360 $MYSECRET
  5. Add an authentication entry to the storage pool

    Modify the <source> entry in the storage pool's XML file using virsh edit, and add an <auth> element, specifying authentication type, username, and secret usage.
    For example:
    <pool type='iscsi'>
      <name>iscsirhel7pool</name>
        <source>
           <host name='192.168.122.1'/>
           <device path='iqn.2010-05.com.example.server1:iscsirhel7guest'/>
           <auth type='chap' username='redhat'>
              <secret usage='iscsirhel7secret'/>
           </auth>
        </source>
      <target>
        <path>/dev/disk/by-path</path>
      </target>
    </pool>     

    Note

    The <auth> sub-element exists in different locations within the guest XML's <pool> and <disk> elements. For a <pool>, <auth> is specified within the <source> element, as this describes where to find the pool sources, since authentication is a property of some pool sources (iSCSI and RBD). For a <disk>, which is a sub-element of a domain, the authentication to the iSCSI or RBD disk is a property of the disk.
    In addition, the <auth> sub-element for a disk differs from that of a storage pool.
    <auth username='redhat'>
      <secret type='iscsi' usage='iscsirhel7secret'/>
    </auth>  
  6. Activate the changes

    The storage pool must be started to activate these changes.
    • If the storage pool has not yet been started, follow the steps in Creating Storage Pools with virsh to define and start the storage pool.
    • If the pool has already been started, enter the following commands to stop and restart the storage pool:
      # virsh pool-destroy iscsirhel7pool
      # virsh pool-start iscsirhel7pool
Parameters
The following table provides a list of required parameters for the XML file, the virsh pool-define-as command, and the Virtual Machine Manager application, for creating an iSCSI-based storage pool.

Table 13.5. iSCSI-based storage pool parameters

DescriptionXMLpool-define-asVirtual Machine Manager
The type of storage pool <pool type='iscsi'> [type] iscsi iscsi: iSCSI Target
The name of the storage pool <name>name</name> [name] name Name
The name of the host.

<source>
  <host name='hostname' />

source-host hostname Host Name
The iSCSI IQN.

  device path="iSCSI_IQN" />
</source>

source-dev iSCSI_IQN Source IQN
The path specifying the target. This will be the path used for the storage pool.

<target>
  <path>/dev/disk/by-path</path>
</target>

target path_to_pool Target Path
(Optional) The IQN of the iSCSI initiator. This is only needed when the ACL restricts the LUN to a particular initiator.

<initiator>
  <iqn name='initiator0' />
</initiator>

See the note below. Initiator IQN

Note

The IQN of the iSCSI initiator can be determined using the virsh find-storage-pool-sources-as iscsi command.
If you are using virsh to create the storage pool, continue with verifying that the storage pool was created.
Examples
The following is an example of an XML file for an iSCSI-based storage pool:
<pool type='iscsi'>
  <name>iSCSI_pool</name>
  <source>
    <host name='server1.example.com'/>
    <device path='iqn.2010-05.com.example.server1:iscsirhel7guest'/>
  </source>
  <target>
    <path>/dev/disk/by-path</path>
  </target>
</pool>
  
The following is an example of a command for creating an iSCSI-based storage pool:
# virsh pool-define-as --name iSCSI_pool --type iscsi --source-host server1.example.com --source-dev iqn.2010-05.com.example.server1:iscsirhel7guest --target /dev/disk/by-path
Pool iSCSI_pool defined
The following images show an example of the virtual machine XML configurationVirtual Machine Manager Add a New Storage Pool dialog boxes for creating an iSCSI-based storage pool:
Add a new iSCSI-based storage pool example

Figure 13.8. Add a new iSCSI-based storage pool example

13.2.3.6. LVM-based storage pools

Recommendations
Be aware of the following before creating an LVM-based storage pool:
  • LVM-based storage pools do not provide the full flexibility of LVM.
  • libvirt supports thin logical volumes, but does not provide the features of thin storage pools.
  • LVM-based storage pools are volume groups. You can create volume groups using Logical Volume Manager commands or virsh commands. To manage volume groups using the virsh interface, use the virsh commands to create volume groups.
    For more information about volume groups, see the Red Hat Enterprise Linux Logical Volume Manager Administration Guide.
  • LVM-based storage pools require a full disk partition. If activating a new partition or device with these procedures, the partition will be formatted and all data will be erased. If using the host's existing Volume Group (VG) nothing will be erased. It is recommended to back up the storage device before commencing the following procedure.
    For information on creating LVM volume groups, see the Red Hat Enterprise Linux Logical Volume Manager Administration Guide.
  • If you create an LVM-based storage pool on an existing VG, you should not run the pool-build command.
After ensuring that the VG is prepared, continue creating the storage pool with defining the storage pool.
Parameters
The following table provides a list of required parameters for the XML file, the virsh pool-define-as command, and the Virtual Machine Manager application, for creating an LVM-based storage pool.

Table 13.6. LVM-based storage pool parameters

DescriptionXMLpool-define-asVirtual Machine Manager
The type of storage pool <pool type='logical'> [type] logical logical: LVM Volume Group
The name of the storage pool <name>name</name> [name] name Name
The path to the device for the storage pool

<source>
  <device path='device_path' />

source-dev device_path Source Path
The name of the volume group   <name='VG-name' /> source-name VG-name Source Path
The virtual group format

  <format type='lvm2' />
</source>

source-format lvm2 N/A
The target path

<target>
  <path='target-path' />
</target>

target target-path Target Path

Note

If the logical volume group is made of multiple disk partitions, there may be multiple source devices listed. For example:
<source>
  <device path='/dev/sda1'/>
  <device path='/dev/sdb3'/>
  <device path='/dev/sdc2'/>
  ...
  </source> 
If you are using virsh to create the storage pool, continue with verifying that the storage pool was created.
Examples
The following is an example of an XML file for an LVM-based storage pool:
<pool type='logical'>
  <name>guest_images_lvm</name>
  <source>
    <device path='/dev/sdc'/>
    <name>libvirt_lvm</name>
    <format type='lvm2'/>
  </source>
  <target>
    <path>/dev/libvirt_lvm</path>
  </target>
</pool>  
The following is an example of a command for creating an LVM-based storage pool:
# virsh pool-define-as guest_images_lvm logical --source-dev=/dev/sdc --source-name libvirt_lvm --target /dev/libvirt_lvm
Pool guest_images_lvm defined
The following images show an example of the virtual machine XML configurationVirtual Machine Manager Add a New Storage Pool dialog boxes for creating an LVM-based storage pool:
Add a new LVM-based storage pool example

Figure 13.9. Add a new LVM-based storage pool example

13.2.3.7. NFS-based storage pools

Prerequisites
To create an Network File System (NFS)-based storage pool, an NFS Server should already be configured to be used by the host machine. For more information about NFS, see the Red Hat Enterprise Linux Storage Administration Guide.
After ensuring that the NFS Server is properly configured, continue creating the storage pool with defining the storage pool.
Parameters
The following table provides a list of required parameters for the XML file, the virsh pool-define-as command, and the Virtual Machine Manager application, for creating an NFS-based storage pool.

Table 13.7. NFS-based storage pool parameters

DescriptionXMLpool-define-asVirtual Machine Manager
The type of storage pool <pool type='netfs'> [type] netfs netfs: Network Exported Directory
The name of the storage pool <name>name</name> [name] name Name
The hostname of the NFS server where the mount point is located. This can be a hostname or an IP address.

<source>
  <host name='host_name' />

source-host host_name Host Name
The directory used on the NFS server

  <dir path='source_path' />
</source>

source-path source_path Source Path
The path specifying the target. This will be the path used for the storage pool.

<target>
  <path>/target_path</path>
</target>

target target_path Target Path
If you are using virsh to create the storage pool, continue with verifying that the storage pool was created.
Examples
The following is an example of an XML file for an NFS-based storage pool:
<pool type='netfs'>
  <name>nfspool</name>
  <source>
    <host name='localhost'/>
    <dir path='/home/net_mount'/>
  </source>
  <target>
    <path>/var/lib/libvirt/images/nfspool</path>
  </target>
</pool>  
The following is an example of a command for creating an NFS-based storage pool:
# virsh pool-define-as nfspool netfs --source-host localhost --source-path /home/net_mount --target /var/lib/libvirt/images/nfspool
Pool nfspool defined
The following images show an example of the virtual machine XML configurationVirtual Machine Manager Add a New Storage Pool dialog boxes for creating an NFS-based storage pool:
Add a new NFS-based storage pool example

Figure 13.10. Add a new NFS-based storage pool example

13.2.3.8. vHBA-based storage pools using SCSI devices

Note

You cannot use Virtual Machine Manager to create vHBA-based storage pools using SCSI devices.
Recommendations
N_Port ID Virtualization (NPIV) is a software technology that allows sharing of a single physical Fibre Channel host bus adapter (HBA). This allows multiple guests to see the same storage from multiple physical hosts, and thus allows for easier migration paths for the storage. As a result, there is no need for the migration to create or copy storage, as long as the correct storage path is specified.
In virtualization, the virtual host bus adapter, or vHBA, controls the Logical Unit Numbers (LUNs) for virtual machines. For a host to share one Fibre Channel device path between multiple KVM guests, a vHBA must be created for each virtual machine. A single vHBA must not be used by multiple KVM guests.
Each vHBA for NPIV is identified by its parent HBA and its own World Wide Node Name (WWNN) and World Wide Port Name (WWPN). The path to the storage is determined by the WWNN and WWPN values. The parent HBA can be defined as scsi_host# or as a WWNN/WWPN pair.

Note

If a parent HBA is defined as scsi_host# and hardware is added to the host machine, the scsi_host# assignment may change. Therefore, it is recommended that you define a parent HBA using a WWNN/WWPN pair.
It is recommended that you define a libvirt storage pool based on the vHBA, because this preserves the vHBA configuration.
Using a libvirt storage pool has two primary advantages:
  • The libvirt code can easily find the LUN's path using the virsh command output.
  • Virtual machine migration requires only defining and starting a storage pool with the same vHBA name on the target machine. To do this, the vHBA LUN, libvirt storage pool and volume name must be specified in the virtual machine's XML configuration. Refer to Section 13.2.3.8, “vHBA-based storage pools using SCSI devices” for an example.

Note

Before creating a vHBA, it is recommended that you configure storage array (SAN)-side zoning in the host LUN to provide isolation between guests and prevent the possibility of data corruption.
To create a persistent vHBA configuration, first create a libvirt 'scsi' storage pool XML file using the format below. When creating a single vHBA that uses a storage pool on the same physical HBA, it is recommended to use a stable location for the <path> value, such as one of the /dev/disk/by-{path|id|uuid|label} locations on your system.
When creating multiple vHBAs that use storage pools on the same physical HBA, the value of the <path> field must be only /dev/, otherwise storage pool volumes are visible only to one of the vHBAs, and devices from the host cannot be exposed to multiple guests with the NPIV configuration.
For more information on <path> and the elements in <target>, see upstream libvirt documentation.
Prerequisites
Before creating a vHBA-based storage pools with SCSI devices, create a vHBA:

Procedure 13.10. Creating a vHBA

  1. Locate HBAs on the host system

    To locate the HBAs on your host system, use the virsh nodedev-list --cap vports command.
    The following example shows a host that has two HBAs that support vHBA:
    # virsh nodedev-list --cap vports
    scsi_host3
    scsi_host4
    
  2. Check the HBA's details

    Use the virsh nodedev-dumpxml HBA_device command to see the HBA's details.
    # virsh nodedev-dumpxml scsi_host3
    The output from the command lists the <name>, <wwnn>, and <wwpn> fields, which are used to create a vHBA. <max_vports> shows the maximum number of supported vHBAs. For example:
    <device>
      <name>scsi_host3</name>
      <path>/sys/devices/pci0000:00/0000:00:04.0/0000:10:00.0/host3</path>
      <parent>pci_0000_10_00_0</parent>
      <capability type='scsi_host'>
        <host>3</host>
        <unique_id>0</unique_id>
        <capability type='fc_host'>
          <wwnn>20000000c9848140</wwnn>
          <wwpn>10000000c9848140</wwpn>
          <fabric_wwn>2002000573de9a81</fabric_wwn>
        </capability>
        <capability type='vport_ops'>
          <max_vports>127</max_vports>
          <vports>0</vports>
        </capability>
      </capability>
    </device>   
    In this example, the <max_vports> value shows there are a total 127 virtual ports available for use in the HBA configuration. The <vports> value shows the number of virtual ports currently being used. These values update after creating a vHBA.
  3. Create a vHBA host device

    Create an XML file similar to one of the following for the vHBA host. In this examples, the file is named vhba_host3.xml.
    This example uses scsi_host3 to describe the parent vHBA.
    # cat vhba_host3.xml
    <device>
      <parent>scsi_host3</parent>
      <capability type='scsi_host'>
        <capability type='fc_host'>
        </capability>
      </capability>
    </device>   
    This example uses a WWNN/WWPN pair to describe the parent vHBA.
    # cat vhba_host3.xml
    <device>
      <name>vhba</name>
      <parent wwnn='20000000c9848140' wwpn='10000000c9848140'/>
      <capability type='scsi_host'>
        <capability type='fc_host'>
        </capability>
      </capability>
    </device>   

    Note

    The WWNN and WWPN values must match those in the HBA details seen in Procedure 13.10, “Creating a vHBA”.
    The <parent> field specifies the HBA device to associate with this vHBA device. The details in the <device> tag are used in the next step to create a new vHBA device for the host. For more information on the nodedev XML format, see the libvirt upstream pages.
  4. Create a new vHBA on the vHBA host device

    To create a vHBA on the basis of vhba_host3, use the virsh nodedev-create command:
    # virsh nodedev-create vhba_host3.xml
    Node device scsi_host5 created from vhba_host3.xml
  5. Verify the vHBA

    Verify the new vHBA's details (scsi_host5) with the virsh nodedev-dumpxml command:
    # virsh nodedev-dumpxml scsi_host5
    <device>
      <name>scsi_host5</name>
      <path>/sys/devices/pci0000:00/0000:00:04.0/0000:10:00.0/host3/vport-3:0-0/host5</path>
      <parent>scsi_host3</parent>
      <capability type='scsi_host'>
        <host>5</host>
        <unique_id>2</unique_id>
        <capability type='fc_host'>
          <wwnn>5001a4a93526d0a1</wwnn>
          <wwpn>5001a4ace3ee047d</wwpn>
          <fabric_wwn>2002000573de9a81</fabric_wwn>
        </capability>
      </capability>
    </device>  
After verifying the vHBA, continue creating the storage pool with defining the storage pool.
Parameters
The following table provides a list of required parameters for the XML file, the virsh pool-define-as command, and the Virtual Machine Manager application, for creating a vHBA-based storage pool.

Table 13.8. vHBA-based storage pool parameters

DescriptionXMLpool-define-as
The type of storage pool <pool type='scsi'> scsi
The name of the storage pool <name>name</name> --adapter-name name
The identifier of the vHBA. The parent attribute is optional.

<source>
  <adapter type='fc_host'
  [parent=parent_scsi_device]
  wwnn='WWNN'
  wwpn='WWPN' />
</source>

[--adapter-parent parent]
--adapter-wwnn wwnn
--adapter-wpnn wwpn

The path specifying the target. This will be the path used for the storage pool.

<target>
  <path>target_path</path>
</target>

target path_to_pool

Important

When the <path> field is /dev/, libvirt generates a unique short device path for the volume device path. For example, /dev/sdc. Otherwise, the physical host path is used. For example, /dev/disk/by-path/pci-0000:10:00.0-fc-0x5006016044602198-lun-0. The unique short device path allows the same volume to be listed in multiple guests by multiple storage pools. If the physical host path is used by multiple guests, duplicate device type warnings may occur.

Note

The parent attribute can be used in the <adapter> field to identify the physical HBA parent from which the NPIV LUNs by varying paths can be used. This field, scsi_hostN, is combined with the vports and max_vports attributes to complete the parent identification. The parent, parent_wwnn, parent_wwpn, or parent_fabric_wwn attributes provide varying degrees of assurance that after the host reboots the same HBA is used.
  • If no parent is specified, libvirt uses the first scsi_hostN adapter that supports NPIV.
  • If only the parent is specified, problems can arise if additional SCSI host adapters are added to the configuration.
  • If parent_wwnn or parent_wwpn is specified, after the host reboots the same HBA is used.
  • If parent_fabric_wwn is used, after the host reboots an HBA on the same fabric is selected, regardless of the scsi_hostN used.
If you are using virsh to create the storage pool, continue with verifying that the storage pool was created.
Examples
The following are examples of XML files for vHBA-based storage pools. The first example is for an example of a storage pool that is the only storage pool on the HBA. The second example is for a storage pool that is one of several storage pools that use a single vHBA and uses the parent attribute to identify the SCSI host device.
<pool type='scsi'>
  <name>vhbapool_host3</name>
  <source>
    <adapter type='fc_host' wwnn='5001a4a93526d0a1' wwpn='5001a4ace3ee047d'/>
  </source>
  <target>
    <path>/dev/disk/by-path</path>
  </target>
</pool> 
<pool type='scsi'>
  <name>vhbapool_host3</name>
  <source>
    <adapter type='fc_host' parent='scsi_host3' wwnn='5001a4a93526d0a1' wwpn='5001a4ace3ee047d'/>
  </source>
  <target>
    <path>/dev/disk/by-path</path>
  </target>
</pool>  
The following is an example of a command for creating a vHBA-based storage pool:
# virsh pool-define-as vhbapool_host3 scsi --adapter-parent scsi_host3 --adapter-wwnn 5001a4a93526d0a1 --adapter-wwpn 5001a4ace3ee047d --target /dev/disk/by-path
Pool vhbapool_host3 defined

Note

The virsh command does not provide a way to define the parent_wwnn, parent_wwpn, or parent_fabric_wwn attributes.
Configuring a virtual machine to use a vHBA LUN
After a storage pool is created for a vHBA, the vHBA LUN must be added to the virtual machine configuration.
  1. Create a disk volume on the virtual machine in the virtual machine's XML.
  2. Specify the storage pool and the storage volume in the <source> parameter.
The following shows an example:
<disk type='volume' device='disk'>
  <driver name='qemu' type='raw'/>
  <source pool='vhbapool_host3' volume='unit:0:4:0'/>
  <target dev='hda' bus='ide'/>
</disk>    
To specify a lun device instead of a disk, see the following example:
<disk type='volume' device='lun' sgio='unfiltered'>
  <driver name='qemu' type='raw'/>
  <source pool='vhbapool_host3' volume='unit:0:4:0' mode='host'/>
  <target dev='sda' bus='scsi'/>
  <shareable />
</disk>
For XML configuration examples of adding SCSI LUN-based storage to a guest, see Section 13.3.6.3, “Adding SCSI LUN-based Storage to a Guest”.
Note that to ensure successful reconnection to a LUN in case of a hardware failure, it is recommended that you edit the fast_io_fail_tmo and dev_loss_tmo options. For more information, see Reconnecting to an exposed LUN after a hardware failure.

13.2.4. Deleting Storage Pools

You can delete storage pools using virsh or the Virtual Machine Manager.

13.2.4.1. Prerequisites for deleting a storage pool

To avoid negatively affecting other guest virtual machines that use the storage pool you want to delete, it is recommended that you stop the storage pool and release any resources being used by it.

13.2.4.2. Deleting storage pools using virsh

  1. List the defined storage pools:
    # virsh pool-list --all
    Name                 State      Autostart
    -----------------------------------------
    default              active     yes
    guest_images_pool    active     yes
    
  2. Stop the storage pool you want to delete.
    # virsh pool-destroy guest_images_disk
  3. (Optional) For some types of storage pools, you can optionally remove the directory where the storage pool resides:
    # virsh pool-delete guest_images_disk
  4. Remove the storage pool's definition.
    # virsh pool-undefine guest_images_disk
  5. Confirm the pool is undefined:
    # virsh pool-list --all
    Name                 State      Autostart
    -----------------------------------------
    default              active     yes
    
    

13.2.4.3. Deleting storage pools using Virtual Machine Manager

  1. Select the storage pool you want to delete in the storage pool list in the Storage tab of the Connection Details window.
  2. Click at the bottom of the Storage window. This stops the storage pool and releases any resources in use by it.
  3. Click .

    Note

    The icon is only enabled if the storage pool is stopped.
    The storage pool is deleted.

13.3. Using Storage Volumes

This section provides information about using storage volumes. It provides conceptual information, as well as detailed instructions on creating, configuring, and deleting storage volumes using virsh commands and the Virtual Machine Manager.

13.3.1. Storage Volume Concepts

Storage pools are divided into storage volumes. Storage volumes are abstractions of physical partitions, LVM logical volumes, file-based disk images, and other storage types handled by libvirt. Storage volumes are presented to guest virtual machines as local storage devices regardless of the underlying hardware.

Note

The sections below do not contain all of the possible commands and arguments that virsh provides for managing storage volumes> For more information, see Section 20.30, “Storage Volume Commands”.
On the host machine, a storage volume is referred to by its name and an identifier for the storage pool from which it derives. On the virsh command line, this takes the form --pool storage_pool volume_name.
For example, a volume named firstimage in the guest_images pool.
# virsh vol-info --pool guest_images firstimage
  Name:           firstimage
  Type:           block
  Capacity:       20.00 GB
  Allocation:     20.00 GB
For additional parameters and arguments, see Section 20.34, “Listing Volume Information”.

13.3.2. Creating Storage Volumes

This section provides general instructions for creating storage volumes from storage pools using virsh and the Virtual Machine Manager. After creating storage volumes, you can add storage devices to guests.

13.3.2.1. Creating Storage Volumes with virsh

Do one of the following:
  • Define the storage volume using an XML file.
    a. Create a temporary XML file containing the storage volume information required for the new device.
    The XML file must contain specific fields including the following:
    • name - The name of the storage volume.
    • allocation - The total storage allocation for the storage volume.
    • capacity - The logical capacity of the storage volume. If the volume is sparse, this value can differ from the allocation value.
    • target - The path to the storage volume on the host system and optionally its permissions and label.
    The following shows an example a storage volume definition XML file. In this example, the file is saved to ~/guest_volume.xml
      <volume>
        <name>volume1</name>
        <allocation>0</allocation>
        <capacity>20G</capacity>
        <target>
          <path>/var/lib/virt/images/sparse.img</path>
        </target>
      </volume> 
    b. Use the virsh vol-create command to create the storage volume based on the XML file.
    # virsh vol-create guest_images_dir ~/guest_volume.xml
      Vol volume1 created
    
    c. Delete the XML file created in step a.
  • Use the virsh vol-create-as command to create the storage volume.
    # virsh vol-create-as guest_images_dir volume1 20GB --allocation 0
  • Clone an existing storage volume using the virsh vol-clone command. The virsh vol-clone command must specify the storage pool that contains the storage volume to clone and the name of the newly created storage volume.
    # virsh vol-clone --pool guest_images_dir volume1 clone1

13.3.2.2. Creating storage volumes with Virtual Machine Manager

Procedure 13.11. Creating Storage Volumes with Virtual Machine Manager

  1. Open the storage settings

    1. In the Virtual Machine Manager, open the Edit menu and select Connection Details.
    2. Click the Storage tab in the Connection Details window.
      Storage tab

      Figure 13.11. Storage tab

      The pane on the left of the Connection Details window shows a list of storage pools.
  2. Select the storage pool in which you want to create a storage volume

    In the list of storage pools, click the storage pool in which you want to create the storage volume.
    Any storage volumes configured on the selected storage pool appear in the Volumes pane at the bottom of the window.
  3. Add a new storage volume

    Click the button above the Volumes list. The Add a Storage Volume dialog appears.
    Create storage volume

    Figure 13.12. Create storage volume

  4. Configure the storage volume

    Configure the storage volume with the following parameters:
    • Enter a name for the storage pool in the Name field.
    • Select a format for the storage volume from the Format list.
    • Enter the maximum size for the storage volume in the Max Capacity field.
  5. Finish the creation

    Click Finish. The Add a Storage Volume dialog closes, and the storage volume appears in the Volumes list.

13.3.3. Viewing Storage Volumes

You can create multiple storage volumes from a storage pool. You can also use the virsh vol-list command to list the storage volumes in a storage pool. In the following example, the guest_images_disk contains three volumes.
virsh vol-create-as guest_images_disk volume1 8G
Vol volume1 created

# virsh vol-create-as guest_images_disk volume2 8G
Vol volume2 created

# virsh vol-create-as guest_images_disk volume3 8G
Vol volume3 created

# virsh vol-list guest_images_disk
Name                 Path
-----------------------------------------
volume1              /home/VirtualMachines/guest_images_dir/volume1
volume2              /home/VirtualMachines/guest_images_dir/volume2
volume3              /home/VirtualMachines/guest_images_dir/volume3

13.3.4. Managing Data

This section provides information about managing the data on storage volumes.

Note

Some types of storage volumes do not support all of the data management commands. For specific information, see the sections below.

13.3.4.1. Wiping Storage Volumes

To ensure that data on a storage volume cannot be accessed, a storage volume can be wiped using the virsh vol-wipe command.
Use the virsh vol-wipe command to wipe a storage volume:
# virsh vol-wipe new-vol vdisk
By default, the data is overwritten by zeroes. However, there are a number of different methods that can be specified for wiping the storage volume. For detailed information about all of the options for the virsh vol-wipe command, refer to Section 20.32, “Deleting a Storage Volume's Contents”.

13.3.4.2. Uploading Data to a Storage Volume

You can upload data from a specified local file to a storage volume using the virsh vol-upload command.
# virsh vol-upload --pool pool-or-uuid --offset bytes --length bytes vol-name-or-key-or-path local-file
The following are the main virsh vol-upload options:
  • --pool pool-or-uuid - The name or UUID of the storage pool the volume is in.
  • vol-name-or-key-or-path - The name or key or path of the volume to upload.
  • --offset bytes The position in the storage volume at which to start writing the data.
  • --length length - An upper limit for the amount of data to be uploaded.

    Note

    An error will occur if local-file is greater than the specified --length.

Example 13.1. Uploading data to a storage volume

# virsh vol-upload sde1 /tmp/data500m.empty disk-pool
In this example sde1 is a volume in the disk-pool storage pool. The data in /tmp/data500m.empty is copied to sde1.

13.3.4.3. Downloading Data to a Storage Volume

You can download data from a storage volume to a specified local file using the virsh vol-download command.
# vol-download --pool pool-or-uuid --offset bytes --length bytes vol-name-or-key-or-path local-file
The following are the main virsh vol-download options:
  • --pool pool-or-uuid - The name or UUID of the storage pool that the volume is in.
  • vol-name-or-key-or-path - The name, key, or path of the volume to download.
  • --offset - The position in the storage volume at which to start reading the data.
  • --length length - An upper limit for the amount of data to be downloaded.

Example 13.2. Downloading data from a storage volume

# virsh vol-download sde1 /tmp/data-sde1.tmp disk-pool
In this example sde1 is a volume in the disk-pool storage pool. The data in sde1 is downloaded to /tmp/data-sde1.tmp.

13.3.4.4. Resizing Storage Volumes

You can resize the capacity of a specified storage volume using the vol-resize command.
# virsh vol-resize --pool pool-or-uuid vol-name-or-path pool-or-uuid capacity --allocate --delta --shrink
The capacity is expressed in bytes. The command requires --pool pool-or-uuid which is the name or UUID of the storage pool the volume is in. This command also requires vol-name-or-key-or-path, the name, key, or path of the volume to resize.
The new capacity might be sparse unless --allocate is specified. Normally, capacity is the new size, but if --delta is present, then it is added to the existing size. Attempts to shrink the volume will fail unless --shrink is present.
Note that capacity cannot be negative unless --shrink is provided and a negative sign is not necessary. capacity is a scaled integer which defaults to bytes if there is no suffix. In addition, note that this command is only safe for storage volumes not in use by an active guest. Refer to Section 20.13.3, “Changing the Size of a Guest Virtual Machine's Block Device” for live resizing.

Example 13.3. Resizing a storage volume

For example, if you created a 50M storage volume, you can resize it to 100M with the following command:
# virsh vol-resize --pool disk-pool sde1 100M

13.3.5. Deleting Storage Volumes

You can delete storage volumes using virsh or the Virtual Machine Manager.

Note

To avoid negatively affecting guest virtual machines that use the storage volume you want to delete, it is recommended that you release any resources using it.

13.3.5.1. Deleting storage volumes using virsh

Delete a storage volume using the virsh vol-delete command. The command must specify the name or path of the storage volume and the storage pool from which the storage volume is abstracted.
The following example deletes the volume_name storage volume from the guest_images_dir storage pool:
# virsh vol-delete volume_name --pool guest_images_dir
  vol volume_name deleted

13.3.5.2. Deleting storage volumes using Virtual Machine Manager

Procedure 13.12. Deleting Storage Volumes with Virtual Machine Manager

  1. Open the storage settings

    1. In the Virtual Machine Manager, open the Edit menu and select Connection Details.
    2. Click the Storage tab in the Connection Details window.
      Storage tab

      Figure 13.13. Storage tab

      The pane on the left of the Connection Details window shows a list of storage pools.
  2. Select the storage volume you want to delete

    1. In the list of storage pools, click the storage pool from which the storage volume is abstracted.
      A list of storage volumes configured on the selected storage pool appear in the Volumes pane at the bottom of the window.
    2. Select the storage volume you want to delete.
  3. Delete the storage volume

    1. Click the button (above the Volumes list). A confirmation dialog appears.
    2. Click Yes. The selected storage volume is deleted.

13.3.6. Adding Storage Devices to Guests

You can add storage devices to guest virtual machines using virsh or Virtual Machine Manager.

13.3.6.1. Adding Storage Devices to Guests Using virsh

To add storage devices to a guest virtual machine, use the attach-disk command. The arguments that contain information about the disk to add can be specified in an XML file or on the command line.
The following is a sample XML file with the definition of the storage.
<disk type='file' device='disk>'>
  <driver name='qemu' type='raw' cache='none'/>
  <source file='/var/lib/libvirt/images/FileName.img'/>
  <target dev='vdb' bus='virtio'/>
</disk> 
The following command attaches a disk to Guest1 using an XML file called NewStorage.xml.
# virsh attach-disk --config Guest1 ~/NewStorage.xml
The following command attaches a disk to Guest1 without using an xml file.
# virsh attach-disk --config Guest1 --source /var/lib/libvirt/images/FileName.img --target vdb

13.3.6.2. Adding Storage Devices to Guests Using Virtual Machine Manager

You can add a storage volume to a guest virtual machine or create and add a default storage device to a guest virtual machine.
13.3.6.2.1. Adding a storage volume to a guest
To add a storage volume to a guest virtual machine:
  1. Open Virtual Machine Manager to the virtual machine hardware details window

    Open virt-manager by executing the virt-manager command as root or opening ApplicationsSystem ToolsVirtual Machine Manager.
    The Virtual Machine Manager window

    Figure 13.14. The Virtual Machine Manager window

    Select the guest virtual machine to which you want to add a storage volume.
    Click Open. The Virtual Machine window opens.
    Click . The hardware details window appears.
    The Hardware Details window

    Figure 13.15. The Hardware Details window

  2. Open the Add New Virtual Hardware window

    Click Add Hardware. The Add New Virtual Hardware window appears.
    Ensure that Storage is selected in the hardware type pane.
    The Add New Virtual Hardware window

    Figure 13.16. The Add New Virtual Hardware window

  3. View a list of storage volumes

    Select the Select or create custom storage option button.
    Click Manage. The Choose Storage Volume dialog appears.
    The Select Storage Volume window

    Figure 13.17. The Select Storage Volume window

  4. Select a storage volume

    Select a storage pool from the list on the left side of the Select Storage Volume window. A list of storage volumes in the selected storage pool appears in the Volumes list.

    Note

    You can create a storage pool from the Select Storage Volume window. For more information, see Section 13.2.2.2, “Creating storage pools with Virtual Machine Manager”.
    Select a storage volume from the Volumes list.

    Note

    You can create a storage volume from the Select Storage Volume window. For more information, see Section 13.3.2.2, “Creating storage volumes with Virtual Machine Manager”.
    Click Choose Volume. The Select Storage Volume window closes.
  5. Configure the storage volume

    Select a device type from the Device type list. Available types are: Disk device, Floppy device, and LUN Passthrough.
    Select a bus type from the Bus type list. The available bus types are dependent on the selected device type.
    Select a cache mode from the Cache mode list. Available cache modes are: Hypervisor default, none, writethrough, writeback, directsync, unsafe
    Click Finish. The Add New Virtual Hardware window closes.
13.3.6.2.2. Adding default storage to a guest
The default storage pool is a file-based image in /var/lib/libvirt/images/ directory.
To add default storage to a guest virtual machine:
  1. Open Virtual Machine Manager to the virtual machine hardware details window

    Open virt-manager by executing the virt-manager command as root or opening ApplicationsSystem ToolsVirtual Machine Manager.
    The Virtual Machine Manager window

    Figure 13.18. The Virtual Machine Manager window

    Select the guest virtual machine to which you want to add a storage volume.
    Click Open. The Virtual Machine window opens.
    Click . The hardware details window appears.
    The Hardware Details window

    Figure 13.19. The Hardware Details window

  2. Open the Add New Virtual Hardware window

    Click Add Hardware. The Add New Virtual Hardware window appears.
    Ensure that Storage is selected in the hardware type pane.
    The Add New Virtual Hardware window

    Figure 13.20. The Add New Virtual Hardware window

  3. Create a disk for the guest

    Ensure that the Create a disk image for the virtual machine option.
    Enter the size of the disk to create in the textbox below the Create a disk image for the virtual machine option button.
    Click Finish. The Add New Virtual Hardware window closes.

13.3.6.3. Adding SCSI LUN-based Storage to a Guest

There are multiple ways to expose a host SCSI LUN entirely to the guest. Exposing the SCSI LUN to the guest provides the capability to execute SCSI commands directly to the LUN on the guest. This is useful as a means to share a LUN between guests, as well as to share Fibre Channel storage between hosts.
For more information on SCSI LUN-based storage, see vHBA-based storage pools using SCSI devices.

Important

The optional sgio attribute controls whether unprivileged SCSI Generical I/O (SG_IO) commands are filtered for a device='lun' disk. The sgio attribute can be specified as 'filtered' or 'unfiltered', but must be set to 'unfiltered' to allow SG_IO ioctl commands to be passed through on the guest in a persistent reservation.
In addition to setting sgio='unfiltered', the <shareable> element must be set to share a LUN between guests. The sgio attribute defaults to 'filtered' if not specified.
The <disk> XML attribute device='lun' is valid for the following guest disk configurations:
  • type='block' for <source dev='/dev/disk/by-{path|id|uuid|label}'/>
    <disk type='block' device='lun' sgio='unfiltered'>
    ​  <driver name='qemu' type='raw'/>
    ​  <source dev='/dev/disk/by-path/pci-0000\:04\:00.1-fc-0x203400a0b85ad1d7-lun-0'/>
    ​  <target dev='sda' bus='scsi'/>
      <shareable/>
    ​</disk>

    Note

    The backslashes prior to the colons in the <source> device name are required.
  • type='network' for <source protocol='iscsi'... />
    <disk type='network' device='lun' sgio='unfiltered'>
      <driver name='qemu' type='raw'/>
      <source protocol='iscsi' name='iqn.2013-07.com.example:iscsi-net-pool/1'>
        <host name='example.com' port='3260'/>
        <auth username='myuser'>
          <secret type='iscsi' usage='libvirtiscsi'/>
        </auth>
      </source>
      <target dev='sda' bus='scsi'/>
      <shareable/>
    </disk> 
  • type='volume' when using an iSCSI or a NPIV/vHBA source pool as the SCSI source pool.
    The following example XML shows a guest using an iSCSI source pool (named iscsi-net-pool) as the SCSI source pool:
    <disk type='volume' device='lun' sgio='unfiltered'>
      <driver name='qemu' type='raw'/>
      <source pool='iscsi-net-pool' volume='unit:0:0:1' mode='host'/>
      <target dev='sda' bus='scsi'/>
      <shareable/>
    </disk> 

    Note

    The mode= option within the <source> tag is optional, but if used, it must be set to 'host' and not 'direct'. When set to 'host', libvirt will find the path to the device on the local host. When set to 'direct', libvirt will generate the path to the device using the source pool's source host data.
    The iSCSI pool (iscsi-net-pool) in the example above will have a similar configuration to the following:
    # virsh pool-dumpxml iscsi-net-pool
    <pool type='iscsi'>
      <name>iscsi-net-pool</name>
      <capacity unit='bytes'>11274289152</capacity>
      <allocation unit='bytes'>11274289152</allocation>
      <available unit='bytes'>0</available>
      <source>
        <host name='192.168.122.1' port='3260'/>
        <device path='iqn.2013-12.com.example:iscsi-chap-netpool'/>
        <auth type='chap' username='redhat'>
          <secret usage='libvirtiscsi'/>
        </auth>
      </source>
      <target>
        <path>/dev/disk/by-path</path>
        <permissions>
          <mode>0755</mode>
        </permissions>
      </target>
    </pool> 
    To verify the details of the available LUNs in the iSCSI source pool, enter the following command:
    # virsh vol-list iscsi-net-pool
    Name                 Path
    ------------------------------------------------------------------------------
    unit:0:0:1           /dev/disk/by-path/ip-192.168.122.1:3260-iscsi-iqn.2013-12.com.example:iscsi-chap-netpool-lun-1
    unit:0:0:2           /dev/disk/by-path/ip-192.168.122.1:3260-iscsi-iqn.2013-12.com.example:iscsi-chap-netpool-lun-2
  • type='volume' when using a NPIV/vHBA source pool as the SCSI source pool.
    The following example XML shows a guest using a NPIV/vHBA source pool (named vhbapool_host3) as the SCSI source pool:
    <disk type='volume' device='lun' sgio='unfiltered'>
      <driver name='qemu' type='raw'/>
      <source pool='vhbapool_host3' volume='unit:0:1:0'/>
      <target dev='sda' bus='scsi'/>
      <shareable/>
    </disk>  
    The NPIV/vHBA pool (vhbapool_host3) in the example above will have a similar configuration to:
    # virsh pool-dumpxml vhbapool_host3
    <pool type='scsi'>
      <name>vhbapool_host3</name>
      <capacity unit='bytes'>0</capacity>
      <allocation unit='bytes'>0</allocation>
      <available unit='bytes'>0</available>
      <source>
        <adapter type='fc_host' parent='scsi_host3' managed='yes' wwnn='5001a4a93526d0a1' wwpn='5001a4ace3ee045d'/>
      </source>
      <target>
        <path>/dev/disk/by-path</path>
        <permissions>
          <mode>0700</mode>
          <owner>0</owner>
          <group>0</group>
        </permissions>
      </target>
    </pool>  
    To verify the details of the available LUNs on the vHBA, enter the following command:
    # virsh vol-list vhbapool_host3
    Name                 Path
    ------------------------------------------------------------------------------
    unit:0:0:0           /dev/disk/by-path/pci-0000:10:00.0-fc-0x5006016044602198-lun-0
    unit:0:1:0           /dev/disk/by-path/pci-0000:10:00.0-fc-0x5006016844602198-lun-0
    For more information on using a NPIV vHBA with SCSI devices, see Section 13.2.3.8, “vHBA-based storage pools using SCSI devices”.
The following procedure shows an example of adding a SCSI LUN-based storage device to a guest. Any of the above <disk device='lun'> guest disk configurations can be attached with this method. Substitute configurations according to your environment.

Procedure 13.13. Attaching SCSI LUN-based storage to a guest

  1. Create the device file by writing a <disk> element in a new file, and save this file with an XML extension (in this example, sda.xml):
    # cat sda.xml
    <disk type='volume' device='lun' sgio='unfiltered'>
      <driver name='qemu' type='raw'/>
      <source pool='vhbapool_host3' volume='unit:0:1:0'/>
      <target dev='sda' bus='scsi'/>
      <shareable/>
    </disk>  
  2. Associate the device created in sda.xml with your guest virtual machine (Guest1, for example):
    # virsh attach-device --config Guest1 ~/sda.xml

    Note

    Running the virsh attach-device command with the --config option requires a guest reboot to add the device permanently to the guest. Alternatively, the --persistent option can be used instead of --config, which can also be used to hotplug the device to a guest.
Alternatively, the SCSI LUN-based storage can be attached or configured on the guest using virt-manager. To configure this using virt-manager, click the Add Hardware button and add a virtual disk with the required parameters, or change the settings of an existing SCSI LUN device from this window. In Red Hat Enterprise Linux 7.2 and above, the SGIO value can also be configured in virt-manager:
Configuring SCSI LUN storage with virt-manager

Figure 13.21. Configuring SCSI LUN storage with virt-manager

Reconnecting to an exposed LUN after a hardware failure

If the connection to an exposed Fiber Channel (FC) LUN is lost due to a failure of hardware (such as the host bus adapter), the exposed LUNs on the guest may continue to appear as failed even after the hardware failure is fixed. To prevent this, edit the dev_loss_tmo and fast_io_fail_tmo kernel options:
  • dev_loss_tmo controls how long the SCSI layer waits after a SCSI device fails before marking it as failed. To prevent a timeout, it is recommended to set the option to the maximum value, which is 2147483647.
  • fast_io_fail_tmo controls how long the SCSI layer waits after a SCSI device fails before failing back to the I/O. To ensure that dev_loss_tmo is not ignored by the kernel, set this option's value to any number lower than the value of dev_loss_tmo.
To modify the value of dev_loss_tmo and fast_io_fail, do one of the following:
  • Edit the /etc/multipath.conf file, and set the values in the defaults section:
    defaults {
    ...
    fast_io_fail_tmo     20
    dev_loss_tmo    infinity
    }
    
  • Set dev_loss_tmo and fast_io_fail on the level of the FC host or remote port, for example as follows:
    # echo 20 > /sys/devices/pci0000:00/0000:00:06.0/0000:13:00.0/host1/rport-1:0-0/fc_remote_ports/rport-1:0-0/fast_io_fail_tmo
    # echo 2147483647 > /sys/devices/pci0000:00/0000:00:06.0/0000:13:00.0/host1/rport-1:0-0/fc_remote_ports/rport-1:0-0/dev_loss_tmo
To verify that the new values of dev_loss_tmo and fast_io_fail are active, use the following command:
# find /sys -name dev_loss_tmo -print -exec cat {} \;
If the parameters have been set correctly, the output will look similar to this, with the appropriate device or devices instead of pci0000:00/0000:00:06.0/0000:13:00.0/host1/rport-1:0-0/fc_remote_ports/rport-1:0-0:
# find /sys -name dev_loss_tmo -print -exec cat {} \;
...
/sys/devices/pci0000:00/0000:00:06.0/0000:13:00.0/host1/rport-1:0-0/fc_remote_ports/rport-1:0-0/dev_loss_tmo
2147483647
...

13.3.6.4. Managing Storage Controllers in a Guest Virtual Machine

Unlike virtio disks, SCSI devices require the presence of a controller in the guest virtual machine. This section details the necessary steps to create a virtual SCSI controller (also known as "Host Bus Adapter", or HBA), and to add SCSI storage to the guest virtual machine.

Procedure 13.14. Creating a virtual SCSI controller

  1. Display the configuration of the guest virtual machine (Guest1) and look for a pre-existing SCSI controller:
    virsh dumpxml Guest1 | grep controller.*scsi
    If a device controller is present, the command will output one or more lines similar to the following:
    <controller type='scsi' model='virtio-scsi' index='0'/>
    
  2. If the previous step did not show a device controller, create the description for one in a new file and add it to the virtual machine, using the following steps:
    1. Create the device controller by writing a <controller> element in a new file and save this file with an XML extension. virtio-scsi-controller.xml, for example.
      <controller type='scsi' model='virtio-scsi'/>
      
    2. Associate the device controller you just created in virtio-scsi-controller.xml with your guest virtual machine (Guest1, for example):
      virsh attach-device --config Guest1 ~/virtio-scsi-controller.xml
      In this example the --config option behaves the same as it does for disks. See Section 13.3.6, “Adding Storage Devices to Guests” for more information.
  3. Add a new SCSI disk or CD-ROM. The new disk can be added using the methods in Section 13.3.6, “Adding Storage Devices to Guests”. In order to create a SCSI disk, specify a target device name that starts with sd.

    Note

    The supported limit for each controller is 1024 virtio-scsi disks, but it is possible that other available resources in the host (such as file descriptors) are exhausted with fewer disks.
    For more information, see the following Red Hat Enterprise Linux 6 whitepaper: The next-generation storage interface for the Red Hat Enterprise Linux Kernel Virtual Machine: virtio-scsi.
    virsh attach-disk Guest1 /var/lib/libvirt/images/FileName.img sdb --cache none
    Depending on the version of the driver in the guest virtual machine, the new disk may not be detected immediately by a running guest virtual machine. Follow the steps in the Red Hat Enterprise Linux Storage Administration Guide.

13.3.7. Removing Storage Devices from Guests

You can remove storage device from virtual guest machines using virsh or Virtual Machine Manager.

13.3.7.1. Removing Storage from a Virtual Machine with virsh

The following example removes the vdb storage volume from the Guest1 virtual machine:
# virsh detach-disk Guest1 vdb

13.3.7.2. Removing Storage from a Virtual Machine with Virtual Machine Manager

Procedure 13.15. Removing storage from a virtual machine with Virtual Machine Manager

To remove storage from a guest virtual machine using Virtual Machine Manager:
  1. Open Virtual Machine Manager to the virtual machine hardware details window

    Open virt-manager by executing the virt-manager command as root or opening ApplicationsSystem ToolsVirtual Machine Manager.
    Select the guest virtual machine from which you want to remove a storage device.
    Click Open. The Virtual Machine window opens.
    Click . The hardware details window appears.
  2. Remove the storage from the guest virtual machine

    Select the storage device from the list of hardware on the left side of the hardware details pane.
    Click Remove. A confirmation dialog appears.
    Click Yes. The storage is removed from the guest virtual machine.

Chapter 14. Using qemu-img

The qemu-img command-line tool is used for formatting, modifying, and verifying various file systems used by KVM. qemu-img options and usages are highlighted in the sections that follow.

Warning

Never use qemu-img to modify images in use by a running virtual machine or any other process. This may destroy the image. Also, be aware that querying an image that is being modified by another process may encounter inconsistent state.

14.1. Checking the Disk Image

To perform a consistency check on a disk image with the file name imgname.
# qemu-img check [-f format] imgname

Note

Only a selected group of formats support consistency checks. These include qcow2, vdi, vhdx, vmdk, and qed.

14.2. Committing Changes to an Image

Commit any changes recorded in the specified image file (imgname) to the file's base image with the qemu-img commit command. Optionally, specify the file's format type (fmt).
 # qemu-img commit [-f fmt] [-t cache] imgname

14.3. Comparing Images

Compare the contents of two specified image files (imgname1 and imgname2) with the qemu-img compare command. Optionally, specify the files' format types (fmt). The images can have different formats and settings.
By default, images with different sizes are considered identical if the larger image contains only unallocated or zeroed sectors in the area after the end of the other image. In addition, if any sector is not allocated in one image and contains only zero bytes in the other one, it is evaluated as equal. If you specify the -s option, the images are not considered identical if the image sizes differ or a sector is allocated in one image and is not allocated in the second one.
 # qemu-img compare [-f fmt] [-F fmt] [-p] [-s] [-q] imgname1 imgname2
The qemu-img compare command exits with one of the following exit codes:
  • 0 - The images are identical
  • 1 - The images are different
  • 2 - There was an error opening one of the images
  • 3 - There was an error checking a sector allocation
  • 4 - There was an error reading the data

14.4. Mapping an Image

Using the qemu-img map command, your can dump the metadata of the specified image file (imgname) and its backing file chain. The dump shows the allocation state of every sector in the (imgname) with the topmost file that allocates it in the backing file chain. Optionally, specify the file's format type (fmt).
 # qemu-img map [-f fmt] [--output=fmt] imgname
There are two output formats, the human format and the json format:

14.4.1. The human Format

The default format (human) only dumps non-zero, allocated parts of the file. The output identifies a file from where data can be read and the offset in the file. Each line includes four fields. The following shows an example of an output:
Offset          Length          Mapped to       File
0               0x20000         0x50000         /tmp/overlay.qcow2
0x100000        0x10000         0x95380000      /tmp/backing.qcow2
The first line means that 0x20000 (131072) bytes starting at offset 0 in the image are available in tmp/overlay.qcow2 (opened in raw format) starting at offset 0x50000 (327680). Data that is compressed, encrypted, or otherwise not available in raw format causes an error if human format is specified.

Note

File names can include newline characters. Therefore, it is not safe to parse output in human format in scripts.

14.4.2. The json Format

If the json option is specified, the output returns an array of dictionaries in JSON format. In addition to the information provided in the human option, the output includes the following information:
  • data - A Boolean field that shows whether or not the sectors contain data
  • zero - A Boolean field that shows whether or not the data is known to read as zero
  • depth - The depth of the backing file of filename

Note

When the json option is specified, the offset field is optional.
For more information about the qemu-img map command and additional options, see the relevant man page.

14.5. Amending an Image

Amend the image format-specific options for the image file. Optionally, specify the file's format type (fmt).
# qemu-img amend [-p] [-f fmt] [-t cache] -o options filename

Note

This operation is only supported for the qcow2 file format.

14.6. Converting an Existing Image to Another Format

The convert option is used to convert one recognized image format to another image format. For a list of accepted formats, see Section 14.12, “Supported qemu-img Formats”.
# qemu-img convert [-c] [-p] [-f fmt] [-t cache] [-O output_fmt] [-o options] [-S sparse_size] filename output_filename
The -p parameter shows the progress of the command (optional and not for every command) and -S flag allows for the creation of a sparse file, which is included within the disk image. Sparse files in all purposes function like a standard file, except that the physical blocks that only contain zeros (that is, nothing). When the Operating System sees this file, it treats it as it exists and takes up actual disk space, even though in reality it does not take any. This is particularly helpful when creating a disk for a guest virtual machine as this gives the appearance that the disk has taken much more disk space than it has. For example, if you set -S to 50Gb on a disk image that is 10Gb, then your 10Gb of disk space will appear to be 60Gb in size even though only 10Gb is actually being used.
Convert the disk image filename to disk image output_filename using format output_format. The disk image can be optionally compressed with the -c option, or encrypted with the -o option by setting -o encryption. Note that the options available with the -o parameter differ with the selected format.
Only the qcow2 and qcow2 format supports encryption or compression. qcow2 encryption uses the AES format with secure 128-bit keys. qcow2 compression is read-only, so if a compressed sector is converted from qcow2 format, it is written to the new format as uncompressed data.
Image conversion is also useful to get a smaller image when using a format which can grow, such as qcow or cow. The empty sectors are detected and suppressed from the destination image.

14.7. Creating and Formatting New Images or Devices

Create the new disk image filename of size size and format format.
# qemu-img create [-f format] [-o options] filename [size]
If a base image is specified with -o backing_file=filename, the image will only record differences between itself and the base image. The backing file will not be modified unless you use the commit command. No size needs to be specified in this case.

14.8. Displaying Image Information

The info parameter displays information about a disk image filename. The format for the info option is as follows:
# qemu-img info [-f format] filename
This command is often used to discover the size reserved on disk which can be different from the displayed size. If snapshots are stored in the disk image, they are displayed also. This command will show for example, how much space is being taken by a qcow2 image on a block device. This is done by running the qemu-img. You can check that the image in use is the one that matches the output of the qemu-img info command with the qemu-img check command.
# qemu-img info /dev/vg-90.100-sluo/lv-90-100-sluo
image: /dev/vg-90.100-sluo/lv-90-100-sluo
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 0
cluster_size: 65536

14.9. Rebasing a Backing File of an Image

The qemu-img rebase changes the backing file of an image.
# qemu-img rebase [-f fmt] [-t cache] [-p] [-u] -b backing_file [-F backing_fmt] filename
The backing file is changed to backing_file and (if the format of filename supports the feature), the backing file format is changed to backing_format.

Note

Only the qcow2 format supports changing the backing file (rebase).
There are two different modes in which rebase can operate: safe and unsafe.
safe mode is used by default and performs a real rebase operation. The new backing file may differ from the old one and the qemu-img rebase command will take care of keeping the guest virtual machine-visible content of filename unchanged. In order to achieve this, any clusters that differ between backing_file and old backing file of filename are merged into filename before making any changes to the backing file.
Note that safe mode is an expensive operation, comparable to converting an image. The old backing file is required for it to complete successfully.
unsafe mode is used if the -u option is passed to qemu-img rebase. In this mode, only the backing file name and format of filename is changed, without any checks taking place on the file contents. Make sure the new backing file is specified correctly or the guest-visible content of the image will be corrupted.
This mode is useful for renaming or moving the backing file. It can be used without an accessible old backing file. For instance, it can be used to fix an image whose backing file has already been moved or renamed.

14.10. Re-sizing the Disk Image

Change the disk image filename as if it had been created with size size. Only images in raw format can be resized in both directions, whereas qcow2 images can be grown but cannot be shrunk.
Use the following to set the size of the disk image filename to size bytes:
# qemu-img resize filename size
You can also resize relative to the current size of the disk image. To give a size relative to the current size, prefix the number of bytes with + to grow, or - to reduce the size of the disk image by that number of bytes. Adding a unit suffix allows you to set the image size in kilobytes (K), megabytes (M), gigabytes (G) or terabytes (T).
# qemu-img resize filename [+|-]size[K|M|G|T]

Warning

Before using this command to shrink a disk image, you must use file system and partitioning tools inside the VM itself to reduce allocated file systems and partition sizes accordingly. Failure to do so will result in data loss.
After using this command to grow a disk image, you must use file system and partitioning tools inside the VM to actually begin using the new space on the device.

14.11. Listing, Creating, Applying, and Deleting a Snapshot

Using different parameters from the qemu-img snapshot command you can list, apply, create, or delete an existing snapshot (snapshot) of specified image (filename).
# qemu-img snapshot [ -l | -a snapshot | -c snapshot | -d snapshot ] filename
The accepted arguments are as follows:
  • -l lists all snapshots associated with the specified disk image.
  • The apply option, -a, reverts the disk image (filename) to the state of a previously saved snapshot.
  • -c creates a snapshot (snapshot) of an image (filename).
  • -d deletes the specified snapshot.

14.12. Supported qemu-img Formats

When a format is specified in any of the qemu-img commands, the following format types may be used:
  • raw - Raw disk image format (default). This can be the fastest file-based format. If your file system supports holes (for example in ext2 or ext3 ), then only the written sectors will reserve space. Use qemu-img info to obtain the real size used by the image or ls -ls on Unix/Linux. Although Raw images give optimal performance, only very basic features are available with a Raw image. For example, no snapshots are available.
  • qcow2 - QEMU image format, the most versatile format with the best feature set. Use it to have optional AES encryption, zlib-based compression, support of multiple VM snapshots, and smaller images, which are useful on file systems that do not support holes . Note that this expansive feature set comes at the cost of performance.
    Although only the formats above can be used to run on a guest virtual machine or host physical machine, qemu-img also recognizes and supports the following formats in order to convert from them into either raw , or qcow2 format. The format of an image is usually detected automatically. In addition to converting these formats into raw or qcow2 , they can be converted back from raw or qcow2 to the original format. Note that the qcow2 version supplied with Red Hat Enterprise Linux 7 is 1.1. The format that is supplied with previous versions of Red Hat Enterprise Linux will be 0.10. You can revert image files to previous versions of qcow2. To know which version you are using, run qemu-img info qcow2 [imagefilename.img] command. To change the qcow version see Section 23.19.2, “Setting Target Elements”.
  • bochs - Bochs disk image format.
  • cloop - Linux Compressed Loop image, useful only to reuse directly compressed CD-ROM images present for example in the Knoppix CD-ROMs.
  • cow - User Mode Linux Copy On Write image format. The cow format is included only for compatibility with previous versions.
  • dmg - Mac disk image format.
  • nbd - Network block device.
  • parallels - Parallels virtualization disk image format.
  • qcow - Old QEMU image format. Only included for compatibility with older versions.
  • qed - Old QEMU image format. Only included for compatibility with older versions.
  • vdi - Oracle VM VirtualBox hard disk image format.
  • vhdx - Microsoft Hyper-V virtual hard disk-X disk image format.
  • vmdk - VMware 3 and 4 compatible image format.
  • vvfat - Virtual VFAT disk image format.

Chapter 15. KVM Migration

This chapter covers the migration guest virtual machines from one host physical machine that runs the KVM hypervisor to another. Migrating guests is possible because virtual machines run in a virtualized environment instead of directly on the hardware.

15.1. Migration Definition and Benefits

Migration works by sending the state of the guest virtual machine's memory and any virtualized devices to a destination host physical machine. It is recommended to use shared, networked storage to store the guest's images to be migrated. It is also recommended to use libvirt-managed storage pools for shared storage when migrating virtual machines.
Migrations can be performed both with live (running) and non-live (shut-down) guests.
In a live migration, the guest virtual machine continues to run on the source host machine, while the guest's memory pages are transferred to the destination host machine. During migration, KVM monitors the source for any changes in pages it has already transferred, and begins to transfer these changes when all of the initial pages have been transferred. KVM also estimates transfer speed during migration, so when the remaining amount of data to transfer reaches a certain configurable period of time (10ms by default), KVM suspends the original guest virtual machine, transfers the remaining data, and resumes the same guest virtual machine on the destination host physical machine.
In contrast, a non-live migration (offline migration) suspends the guest virtual machine and then copies the guest's memory to the destination host machine. The guest is then resumed on the destination host machine and the memory the guest used on the source host machine is freed. The time it takes to complete such a migration only depends on network bandwidth and latency. If the network is experiencing heavy use or low bandwidth, the migration will take much longer. Note that if the original guest virtual machine modifies pages faster than KVM can transfer them to the destination host physical machine, offline migration must be used, as live migration would never complete.
Migration is useful for:
Load balancing
Guest virtual machines can be moved to host physical machines with lower usage if their host machine becomes overloaded, or if another host machine is under-utilized.
Hardware independence
When you need to upgrade, add, or remove hardware devices on the host physical machine, you can safely relocate guest virtual machines to other host physical machines. This means that guest virtual machines do not experience any downtime for hardware improvements.
Energy saving
Virtual machines can be redistributed to other host physical machines, and the unloaded host systems can thus be powered off to save energy and cut costs in low usage periods.
Geographic migration
Virtual machines can be moved to another location for lower latency or when required by other reasons.

15.2. Migration Requirements and Limitations

Before using KVM migration, make sure that your system fulfills the migration's requirements, and that you are aware of its limitations.

Migration requirements

  • A guest virtual machine installed on shared storage using one of the following protocols:
    • Fibre Channel-based LUNs
    • iSCSI
    • NFS
    • GFS2
    • SCSI RDMA protocols (SCSI RCP): the block export protocol used in Infiniband and 10GbE iWARP adapters
  • Make sure that the libvirtd service is enabled and running.
    # systemctl enable libvirtd.service
    # systemctl restart libvirtd.service
  • The ability to migrate effectively is dependant on the parameter setting in the /etc/libvirt/libvirtd.conf file. To edit this file, use the following procedure:

    Procedure 15.1. Configuring libvirtd.conf

    1. Opening the libvirtd.conf requires running the command as root:
      # vim /etc/libvirt/libvirtd.conf
    2. Change the parameters as needed and save the file.
    3. Restart the libvirtd service:
      # systemctl restart libvirtd
  • The migration platforms and versions should be checked against Table 15.1, “Live Migration Compatibility”
  • Use a separate system exporting the shared storage medium. Storage should not reside on either of the two host physical machines used for the migration.
  • Shared storage must mount at the same location on source and destination systems. The mounted directory names must be identical. Although it is possible to keep the images using different paths, it is not recommended. Note that, if you intend to use virt-manager to perform the migration, the path names must be identical. If you intend to use virsh to perform the migration, different network configurations and mount directories can be used with the help of --xml option or pre-hooks . For more information on pre-hooks, see the libvirt upstream documentation, and for more information on the XML option, see Chapter 23, Manipulating the Domain XML.
  • When migration is attempted on an existing guest virtual machine in a public bridge+tap network, the source and destination host machines must be located on the same network. Otherwise, the guest virtual machine network will not operate after migration.

Migration Limitations

  • Guest virtual machine migration has the following limitations when used on Red Hat Enterprise Linux with virtualization technology based on KVM:
    • Point to point migration – must be done manually to designate destination hypervisor from originating hypervisor
    • No validation or roll-back is available
    • Determination of target may only be done manually
    • Storage migration cannot be performed live on Red Hat Enterprise Linux 7, but you can migrate storage while the guest virtual machine is powered down. Live storage migration is available on Red Hat Virtualization. Call your service representative for details.

Note

If you are migrating a guest machine that has virtio devices on it, make sure to set the number of vectors on any virtio device on either platform to 32 or fewer. For detailed information, see Section 23.17, “Devices”.

15.3. Live Migration and Red Hat Enterprise Linux Version Compatibility

Live Migration is supported as shown in Table 15.1, “Live Migration Compatibility”:

Table 15.1. Live Migration Compatibility

Migration Method Release Type Example Live Migration Support Notes
Forward Major release 6.5+ → 7.x Fully supported Any issues should be reported
Backward Major release 7.x → 6.y Not supported
ForwardMinor release7.x → 7.y (7.0 → 7.1)Fully supportedAny issues should be reported
BackwardMinor release7.y → 7.x (7.1 → 7.0)Fully supportedAny issues should be reported

Troubleshooting problems with migration

  • Issues with the migration protocol — If backward migration ends with "unknown section error", repeating the migration process can repair the issue as it may be a transient error. If not, report the problem.
  • Issues with audio devices — When migrating from Red Hat Enterprise Linux 6.x to Red Hat Enterprise Linux 7.y, note that the es1370 audio card is no longer supported. Use the ac97 audio card instead.
  • Issues with network cards — When migrating from Red Hat Enterprise Linux 6.x to Red Hat Enterprise Linux 7.y, note that the pcnet and ne2k_pci network cards are no longer supported. Use the virtio-net network device instead.
Configuring Network Storage

Configure shared storage and install a guest virtual machine on the shared storage.

15.4. Shared Storage Example: NFS for a Simple Migration

Important

This example uses NFS to share guest virtual machine images with other KVM host physical machines. Although not practical for large installations, it is presented to demonstrate migration techniques only. Do not use this example for migrating or running more than a few guest virtual machines. In addition, it is required that the synch parameter is enabled. This is required for proper export of the NFS storage.
iSCSI storage is a better choice for large deployments. For configuration details, see Section 13.2.3.5, “iSCSI-based storage pools”.
For detailed information on configuring NFS, opening IP tables, and configuring the firewall, see Red Hat Linux Storage Administration Guide.
Make sure that NFS file locking is not used as it is not supported in KVM.
  1. Export your libvirt image directory

    Migration requires storage to reside on a system that is separate to the migration target systems. On this separate system, export the storage by adding the default image directory to the /etc/exports file:
    /var/lib/libvirt/images *.example.com(rw,no_root_squash,sync)
    Change the hostname parameter as required for your environment.
  2. Start NFS

    1. Install the NFS packages if they are not yet installed:
      # yum install nfs-utils
    2. Make sure that the ports for NFS in iptables (2049, for example) are opened and add NFS to the /etc/hosts.allow file.
    3. Start the NFS service:
      # systemctl start nfs-server
  3. Mount the shared storage on the source and the destination

    On the migration source and the destination systems, mount the /var/lib/libvirt/images directory:
    # mount storage_host:/var/lib/libvirt/images /var/lib/libvirt/images

    Warning

    Whichever directory is chosen for the source host physical machine must be exactly the same as that on the destination host physical machine. This applies to all types of shared storage. The directory must be the same or the migration with virt-manager will fail.

15.5. Live KVM Migration with virsh

A guest virtual machine can be migrated to another host physical machine with the virsh command. The migrate command accepts parameters in the following format:
# virsh migrate --live GuestName DestinationURL
Note that the --live option may be eliminated when live migration is not required. Additional options are listed in Section 15.5.2, “Additional Options for the virsh migrate Command”.
The GuestName parameter represents the name of the guest virtual machine which you want to migrate.
The DestinationURL parameter is the connection URL of the destination host physical machine. The destination system must run the same version of Red Hat Enterprise Linux, be using the same hypervisor and have libvirt running.

Note

The DestinationURL parameter for normal migration and peer2peer migration has different semantics:
  • normal migration: the DestinationURL is the URL of the target host physical machine as seen from the source guest virtual machine.
  • peer2peer migration: DestinationURL is the URL of the target host physical machine as seen from the source host physical machine.
Once the command is entered, you will be prompted for the root password of the destination system.

Important

Name resolution must be working on both sides (source and destination) in order for migration to succeed. Each side must be able to find the other. Make sure that you can ping one side to the other to check that the name resolution is working.
Example: live migration with virsh

This example migrates from host1.example.com to host2.example.com. Change the host physical machine names for your environment. This example migrates a virtual machine named guest1-rhel6-64.

This example assumes you have fully configured shared storage and meet all the prerequisites (listed here: Migration requirements).
  1. Verify the guest virtual machine is running

    From the source system, host1.example.com, verify guest1-rhel6-64 is running:
    [root@host1 ~]# virsh list
    Id Name                 State
    ----------------------------------
     10 guest1-rhel6-64     running
    
  2. Migrate the guest virtual machine

    Execute the following command to live migrate the guest virtual machine to the destination, host2.example.com. Append /system to the end of the destination URL to tell libvirt that you need full access.
    # virsh migrate --live guest1-rhel7-64 qemu+ssh://host2.example.com/system
    Once the command is entered you will be prompted for the root password of the destination system.
  3. Wait

    The migration may take some time depending on load and the size of the guest virtual machine. virsh only reports errors. The guest virtual machine continues to run on the source host physical machine until fully migrated.
  4. Verify the guest virtual machine has arrived at the destination host

    From the destination system, host2.example.com, verify guest1-rhel7-64 is running:
    [root@host2 ~]# virsh list
    Id Name                 State
    ----------------------------------
     10 guest1-rhel7-64     running
    
The live migration is now complete.

Note

libvirt supports a variety of networking methods including TLS/SSL, UNIX sockets, SSH, and unencrypted TCP. For more information on using other methods, see Chapter 18, Remote Management of Guests.

Note

Non-running guest virtual machines can be migrated using the following command:
# virsh migrate --offline --persistent 

15.5.1. Additional Tips for Migration with virsh

It is possible to perform multiple, concurrent live migrations where each migration runs in a separate command shell. However, this should be done with caution and should involve careful calculations as each migration instance uses one MAX_CLIENT from each side (source and target). As the default setting is 20, there is enough to run 10 instances without changing the settings. Should you need to change the settings, see the procedure Procedure 15.1, “Configuring libvirtd.conf”.
  1. Open the libvirtd.conf file as described in Procedure 15.1, “Configuring libvirtd.conf”.
  2. Look for the Processing controls section.
    #################################################################
    #
    # Processing controls
    #
    
    # The maximum number of concurrent client connections to allow
    # over all sockets combined.
    #max_clients = 5000
    
    # The maximum length of queue of connections waiting to be
    # accepted by the daemon. Note, that some protocols supporting
    # retransmission may obey this so that a later reattempt at
    # connection succeeds.
    #max_queued_clients = 1000
    
    # The minimum limit sets the number of workers to start up
    # initially. If the number of active clients exceeds this,
    # then more threads are spawned, upto max_workers limit.
    # Typically you'd want max_workers to equal maximum number
    # of clients allowed
    #min_workers = 5
    #max_workers = 20
    
    
    # The number of priority workers. If all workers from above
    # pool will stuck, some calls marked as high priority
    # (notably domainDestroy) can be executed in this pool.
    #prio_workers = 5
    
    # Total global limit on concurrent RPC calls. Should be
    # at least as large as max_workers. Beyond this, RPC requests
    # will be read into memory and queued. This directly impact
    # memory usage, currently each request requires 256 KB of
    # memory. So by default upto 5 MB of memory is used
    #
    # XXX this isn't actually enforced yet, only the per-client
    # limit is used so far
    #max_requests = 20
    
    # Limit on concurrent requests from a single client
    # connection. To avoid one client monopolizing the server
    # this should be a small fraction of the global max_requests
    # and max_workers parameter
    #max_client_requests = 5
    
    #################################################################
    
  3. Change the max_clients and max_workers parameters settings. It is recommended that the number be the same in both parameters. The max_clients will use 2 clients per migration (one per side) and max_workers will use 1 worker on the source and 0 workers on the destination during the perform phase and 1 worker on the destination during the finish phase.

    Important

    The max_clients and max_workers parameters settings are affected by all guest virtual machine connections to the libvirtd service. This means that any user that is using the same guest virtual machine and is performing a migration at the same time will also obey the limits set in the max_clients and max_workers parameters settings. This is why the maximum value needs to be considered carefully before performing a concurrent live migration.

    Important

    The max_clients parameter controls how many clients are allowed to connect to libvirt. When a large number of containers are started at once, this limit can be easily reached and exceeded. The value of the max_clients parameter could be increased to avoid this, but doing so can leave the system more vulnerable to denial of service (DoS) attacks against instances. To alleviate this problem, a new max_anonymous_clients setting has been introduced in Red Hat Enterprise Linux 7.0 that specifies a limit of connections which are accepted but not yet authenticated. You can implement a combination of max_clients and max_anonymous_clients to suit your workload.
  4. Save the file and restart the service.

    Note

    There may be cases where a migration connection drops because there are too many ssh sessions that have been started, but not yet authenticated. By default, sshd allows only 10 sessions to be in a "pre-authenticated state" at any time. This setting is controlled by the MaxStartups parameter in the sshd configuration file (located here: /etc/ssh/sshd_config), which may require some adjustment. Adjusting this parameter should be done with caution as the limitation is put in place to prevent DoS attacks (and over-use of resources in general). Setting this value too high will negate its purpose. To change this parameter, edit the file /etc/ssh/sshd_config, remove the # from the beginning of the MaxStartups line, and change the 10 (default value) to a higher number. Remember to save the file and restart the sshd service. For more information, see the sshd_config man page.

15.5.2. Additional Options for the virsh migrate Command

In addition to --live, virsh migrate accepts the following options:
  • --direct - used for direct migration
  • --p2p - used for peer-to-peer migration
  • --tunneled - used for tunneled migration
  • --offline - migrates domain definition without starting the domain on destination and without stopping it on source host. Offline migration may be used with inactive domains and it must be used with the --persistent option.
  • --persistent - leaves the domain persistent on destination host physical machine
  • --undefinesource - undefines the domain on the source host physical machine
  • --suspend - leaves the domain paused on the destination host physical machine
  • --change-protection - enforces that no incompatible configuration changes will be made to the domain while the migration is underway; this flag is implicitly enabled when supported by the hypervisor, but can be explicitly used to reject the migration if the hypervisor lacks change protection support.
  • --unsafe - forces the migration to occur, ignoring all safety procedures.
  • --verbose - displays the progress of migration as it is occurring
  • --compressed - activates compression of memory pages that have to be transferred repeatedly during live migration.
  • --abort-on-error - cancels the migration if a soft error (for example I/O error) happens during the migration.
  • --domain [name] - sets the domain name, id or uuid.
  • --desturi [URI] - connection URI of the destination host as seen from the client (normal migration) or source (p2p migration).
  • --migrateuri [URI] - the migration URI, which can usually be omitted.
  • --graphicsuri [URI] - graphics URI to be used for seamless graphics migration.
  • --listen-address [address] - sets the listen address that the hypervisor on the destination side should bind to for incoming migration.
  • --timeout [seconds] - forces a guest virtual machine to suspend when the live migration counter exceeds N seconds. It can only be used with a live migration. Once the timeout is initiated, the migration continues on the suspended guest virtual machine.
  • --dname [newname] - is used for renaming the domain during migration, which also usually can be omitted
  • --xml [filename] - the filename indicated can be used to supply an alternative XML file for use on the destination to supply a larger set of changes to any host-specific portions of the domain XML, such as accounting for naming differences between source and destination in accessing underlying storage. This option is usually omitted.
  • --migrate-disks [disk_identifiers] - this option can be used to select which disks are copied during the migration. This allows for more efficient live migration when copying certain disks is undesirable, such as when they already exist on the destination, or when they are no longer useful. [disk_identifiers] should be replaced by a comma-separated list of disks to be migrated, identified by their arguments found in the <target dev= /> line of the Domain XML file.
In addition, the following commands may help as well:
  • virsh migrate-setmaxdowntime [domain] [downtime] - will set a maximum tolerable downtime for a domain which is being live-migrated to another host. The specified downtime is in milliseconds. The domain specified must be the same domain that is being migrated.
  • virsh migrate-compcache [domain] --size - will set and or get the size of the cache in bytes which is used for compressing repeatedly transferred memory pages during a live migration. When the --size is not used the command displays the current size of the compression cache. When --size is used, and specified in bytes, the hypervisor is asked to change compression to match the indicated size, following which the current size is displayed. The --size argument is supposed to be used while the domain is being live migrated as a reaction to the migration progress and increasing number of compression cache misses obtained from the domjobinfo.
  • virsh migrate-setspeed [domain] [bandwidth] - sets the migration bandwidth in Mib/sec for the specified domain which is being migrated to another host.
  • virsh migrate-getspeed [domain] - gets the maximum migration bandwidth that is available in Mib/sec for the specified domain.
For more information, see Migration Limitations or the virsh man page.

15.6. Migrating with virt-manager

This section covers migrating a KVM guest virtual machine with virt-manager from one host physical machine to another.
  1. Connect to the target host physical machine

    In the virt-manager interface, connect to the target host physical machine by selecting the File menu, then click Add Connection.
  2. Add connection

    The Add Connection window appears.
    Adding a connection to the target host physical machine

    Figure 15.1. Adding a connection to the target host physical machine

    Enter the following details:
    • Hypervisor: Select QEMU/KVM.
    • Method: Select the connection method.
    • Username: Enter the user name for the remote host physical machine.
    • Hostname: Enter the host name for the remote host physical machine.

    Note

    For more information on the connection options, see Section 19.5, “Adding a Remote Connection”.
    Click Connect. An SSH connection is used in this example, so the specified user's password must be entered in the next step.
    Enter password

    Figure 15.2. Enter password

  3. Configure shared storage

    Ensure that both the source and the target host are sharing storage, for example using NFS.
  4. Migrate guest virtual machines

    Right-click the guest that is to be migrated, and click Migrate.
    In the New Host field, use the drop-down list to select the host physical machine you wish to migrate the guest virtual machine to and click Migrate.
    Choosing the destination host physical machine and starting the migration process

    Figure 15.3. Choosing the destination host physical machine and starting the migration process

    A progress window appears.
    Progress window

    Figure 15.4. Progress window

    If the migration finishes without any problems, virt-manager displays the newly migrated guest virtual machine running in the destination host.
    Migrated guest virtual machine running in the destination host physical machine

    Figure 15.5. Migrated guest virtual machine running in the destination host physical machine

Chapter 16. Guest Virtual Machine Device Configuration

Red Hat Enterprise Linux 7 supports three classes of devices for guest virtual machines:
  • Emulated devices are purely virtual devices that mimic real hardware, allowing unmodified guest operating systems to work with them using their standard in-box drivers.
  • Virtio devices (also known as paravirtualized) are purely virtual devices designed to work optimally in a virtual machine. Virtio devices are similar to emulated devices, but non-Linux virtual machines do not include the drivers they require by default. Virtualization management software like the Virtual Machine Manager (virt-manager) and the Red Hat Virtualization Hypervisor install these drivers automatically for supported non-Linux guest operating systems. Red Hat Enterprise Linux 7 supports up to 216 virtio devices. For more information, see Chapter 5, KVM Paravirtualized (virtio) Drivers.
  • Assigned devices are physical devices that are exposed to the virtual machine. This method is also known as passthrough. Device assignment allows virtual machines exclusive access to PCI devices for a range of tasks, and allows PCI devices to appear and behave as if they were physically attached to the guest operating system. Red Hat Enterprise Linux 7 supports up to 32 assigned devices per virtual machine.
    Device assignment is supported on PCIe devices, including select graphics devices. Parallel PCI devices may be supported as assigned devices, but they have severe limitations due to security and system configuration conflicts.
Red Hat Enterprise Linux 7 supports PCI hot plug of devices exposed as single-function slots to the virtual machine. Single-function host devices and individual functions of multi-function host devices may be configured to enable this. Configurations exposing devices as multi-function PCI slots to the virtual machine are recommended only for non-hotplug applications.
For more information on specific devices and related limitations, see Section 23.17, “Devices”.

Note

Platform support for interrupt remapping is required to fully isolate a guest with assigned devices from the host. Without such support, the host may be vulnerable to interrupt injection attacks from a malicious guest. In an environment where guests are trusted, the admin may opt-in to still allow PCI device assignment using the allow_unsafe_interrupts option to the vfio_iommu_type1 module. This may either be done persistently by adding a .conf file (for example local.conf) to /etc/modprobe.d containing the following:
options vfio_iommu_type1 allow_unsafe_interrupts=1
or dynamically using the sysfs entry to do the same:
# echo 1 > /sys/module/vfio_iommu_type1/parameters/allow_unsafe_interrupts

16.1. PCI Devices

PCI device assignment is only available on hardware platforms supporting either Intel VT-d or AMD IOMMU. These Intel VT-d or AMD IOMMU specifications must be enabled in the host BIOS for PCI device assignment to function.

Procedure 16.1. Preparing an Intel system for PCI device assignment

  1. Enable the Intel VT-d specifications

    The Intel VT-d specifications provide hardware support for directly assigning a physical device to a virtual machine. These specifications are required to use PCI device assignment with Red Hat Enterprise Linux.
    The Intel VT-d specifications must be enabled in the BIOS. Some system manufacturers disable these specifications by default. The terms used to see these specifications can differ between manufacturers; consult your system manufacturer's documentation for the appropriate terms.
  2. Activate Intel VT-d in the kernel

    Activate Intel VT-d in the kernel by adding the intel_iommu=on and iommu=pt parameters to the end of the GRUB_CMDLINX_LINUX line, within the quotes, in the /etc/sysconfig/grub file.
    The example below is a modified grub file with Intel VT-d activated.
    GRUB_CMDLINE_LINUX="rd.lvm.lv=vg_VolGroup00/LogVol01
    vconsole.font=latarcyrheb-sun16 rd.lvm.lv=vg_VolGroup_1/root
    vconsole.keymap=us $([ -x /usr/sbin/rhcrashkernel-param ] && /usr/sbin/
    rhcrashkernel-param || :) rhgb quiet intel_iommu=on iommu=pt"
  3. Regenerate config file

    Regenerate /etc/grub2.cfg by running:
    grub2-mkconfig -o /etc/grub2.cfg
    Note that if you are using a UEFI-based host, the target file should be /etc/grub2-efi.cfg.
  4. Ready to use

    Reboot the system to enable the changes. Your system is now capable of PCI device assignment.

Procedure 16.2. Preparing an AMD system for PCI device assignment

  1. Enable the AMD IOMMU specifications

    The AMD IOMMU specifications are required to use PCI device assignment in Red Hat Enterprise Linux. These specifications must be enabled in the BIOS. Some system manufacturers disable these specifications by default.
  2. Enable IOMMU kernel support

    Append iommu=pt to the end of the GRUB_CMDLINX_LINUX line, within the quotes, in /etc/sysconfig/grub so that AMD IOMMU specifications are enabled at boot.
  3. Regenerate config file

    Regenerate /etc/grub2.cfg by running:
    grub2-mkconfig -o /etc/grub2.cfg
    Note that if you are using a UEFI-based host, the target file should be /etc/grub2-efi.cfg.
  4. Ready to use

    Reboot the system to enable the changes. Your system is now capable of PCI device assignment.

Note

For further information on IOMMU, see Appendix E, Working with IOMMU Groups.

16.1.1. Assigning a PCI Device with virsh

These steps cover assigning a PCI device to a virtual machine on a KVM hypervisor.
This example uses a PCIe network controller with the PCI identifier code, pci_0000_01_00_0, and a fully virtualized guest machine named guest1-rhel7-64.

Procedure 16.3. Assigning a PCI device to a guest virtual machine with virsh

  1. Identify the device

    First, identify the PCI device designated for device assignment to the virtual machine. Use the lspci command to list the available PCI devices. You can refine the output of lspci with grep.
    This example uses the Ethernet controller highlighted in the following output:
    # lspci | grep Ethernet
    00:19.0 Ethernet controller: Intel Corporation 82567LM-2 Gigabit Network Connection
    01:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
    01:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
    This Ethernet controller is shown with the short identifier 00:19.0. We need to find out the full identifier used by virsh in order to assign this PCI device to a virtual machine.
    To do so, use the virsh nodedev-list command to list all devices of a particular type (pci) that are attached to the host machine. Then look at the output for the string that maps to the short identifier of the device you wish to use.
    This example shows the string that maps to the Ethernet controller with the short identifier 00:19.0. Note that the : and . characters are replaced with underscores in the full identifier.
    # virsh nodedev-list --cap pci
    pci_0000_00_00_0
    pci_0000_00_01_0
    pci_0000_00_03_0
    pci_0000_00_07_0
    pci_0000_00_10_0
    pci_0000_00_10_1
    pci_0000_00_14_0
    pci_0000_00_14_1
    pci_0000_00_14_2
    pci_0000_00_14_3
    pci_0000_00_19_0
    pci_0000_00_1a_0
    pci_0000_00_1a_1
    pci_0000_00_1a_2
    pci_0000_00_1a_7
    pci_0000_00_1b_0
    pci_0000_00_1c_0
    pci_0000_00_1c_1
    pci_0000_00_1c_4
    pci_0000_00_1d_0
    pci_0000_00_1d_1
    pci_0000_00_1d_2
    pci_0000_00_1d_7
    pci_0000_00_1e_0
    pci_0000_00_1f_0
    pci_0000_00_1f_2
    pci_0000_00_1f_3
    pci_0000_01_00_0
    pci_0000_01_00_1
    pci_0000_02_00_0
    pci_0000_02_00_1
    pci_0000_06_00_0
    pci_0000_07_02_0
    pci_0000_07_03_0
    Record the PCI device number that maps to the device you want to use; this is required in other steps.
  2. Review device information

    Information on the domain, bus, and function are available from output of the virsh nodedev-dumpxml command:
    
    # virsh nodedev-dumpxml pci_0000_00_19_0
    <device>
      <name>pci_0000_00_19_0</name>
      <parent>computer</parent>
      <driver>
        <name>e1000e</name>
      </driver>
      <capability type='pci'>
        <domain>0</domain>
        <bus>0</bus>
        <slot>25</slot>
        <function>0</function>
        <product id='0x1502'>82579LM Gigabit Network Connection</product>
        <vendor id='0x8086'>Intel Corporation</vendor>
        <iommuGroup number='7'>
          <address domain='0x0000' bus='0x00' slot='0x19' function='0x0'/>
        </iommuGroup>
      </capability>
    </device>
    

    Figure 16.1. Dump contents

    Note

    An IOMMU group is determined based on the visibility and isolation of devices from the perspective of the IOMMU. Each IOMMU group may contain one or more devices. When multiple devices are present, all endpoints within the IOMMU group must be claimed for any device within the group to be assigned to a guest. This can be accomplished either by also assigning the extra endpoints to the guest or by detaching them from the host driver using virsh nodedev-detach. Devices contained within a single group may not be split between multiple guests or split between host and guest. Non-endpoint devices such as PCIe root ports, switch ports, and bridges should not be detached from the host drivers and will not interfere with assignment of endpoints.
    Devices within an IOMMU group can be determined using the iommuGroup section of the virsh nodedev-dumpxml output. Each member of the group is provided in a separate "address" field. This information may also be found in sysfs using the following:
    $ ls /sys/bus/pci/devices/0000:01:00.0/iommu_group/devices/
    An example of the output from this would be:
    0000:01:00.0  0000:01:00.1
    To assign only 0000.01.00.0 to the guest, the unused endpoint should be detached from the host before starting the guest:
    $ virsh nodedev-detach pci_0000_01_00_1
  3. Determine required configuration details

    See the output from the virsh nodedev-dumpxml pci_0000_00_19_0 command for the values required for the configuration file.
    The example device has the following values: bus = 0, slot = 25 and function = 0. The decimal configuration uses those three values:
    bus='0'
    slot='25'
    function='0'
  4. Add configuration details

    Run virsh edit, specifying the virtual machine name, and add a device entry in the <devices> section to assign the PCI device to the guest virtual machine. For example:
    # virsh edit guest1-rhel7-64
    
    <devices>
    	[...]
     <hostdev mode='subsystem' type='pci' managed='yes'>
       <source>
          <address domain='0' bus='0' slot='25' function='0'/>
       </source>
     </hostdev>
     [...]
    </devices>
    

    Figure 16.2. Add PCI device

    Alternately, run virsh attach-device, specifying the virtual machine name and the guest's XML file:
    virsh attach-device guest1-rhel7-64 file.xml

    Note

    PCI devices may include an optional read-only memory (ROM) module, also known as an option ROM or expansion ROM, for delivering device firmware or pre-boot drivers (such as PXE) for the device. Generally, these option ROMs also work in a virtualized environment when using PCI device assignment to attach a physical PCI device to a VM.
    However, in some cases, the option ROM can be unnecessary, which may cause the VM to boot more slowly, or the pre-boot driver delivered by the device can be incompatible with virtualization, which may cause the guest OS boot to fail. In such cases, Red Hat recommends masking the option ROM from the VM. To do so:
    1. On the host, verify that the device to assign has an expansion ROM base address register (BAR). To do so, use the lspci -v command for the device, and check the output for a line that includes the following:
      Expansion ROM at
    2. Add the <rom bar='off'/> element as a child of the <hostdev> element in the guest's XML configuration:
      <hostdev mode='subsystem' type='pci' managed='yes'>
        <source>
           <address domain='0' bus='0' slot='25' function='0'/>
        </source>
        <rom bar='off'/>
      </hostdev>
      
  5. Start the virtual machine

    # virsh start guest1-rhel7-64
The PCI device should now be successfully assigned to the virtual machine, and accessible to the guest operating system.

16.1.2. Assigning a PCI Device with virt-manager

PCI devices can be added to guest virtual machines using the graphical virt-manager tool. The following procedure adds a Gigabit Ethernet controller to a guest virtual machine.

Procedure 16.4. Assigning a PCI device to a guest virtual machine using virt-manager

  1. Open the hardware settings

    Open the guest virtual machine and click the Add Hardware button to add a new device to the virtual machine.
    The virtual machine hardware window with the Information button selected on the top taskbar and Overview selected on the left menu pane.

    Figure 16.3. The virtual machine hardware information window

  2. Select a PCI device

    Select PCI Host Device from the Hardware list on the left.
    Select an unused PCI device. Note that selecting PCI devices presently in use by another guest causes errors. In this example, a spare audio controller is used. Click Finish to complete setup.
    The Add new virtual hardware wizard with PCI Host Device selected on the left menu pane, showing a list of host devices for selection in the right menu pane.

    Figure 16.4. The Add new virtual hardware wizard

  3. Add the new device

    The setup is complete and the guest virtual machine now has direct access to the PCI device.
    The virtual machine hardware window with the Information button selected on the top taskbar and Overview selected on the left menu pane, displaying the newly added PCI Device in the list of virtual machine devices in the left menu pane.

    Figure 16.5. The virtual machine hardware information window

Note

If device assignment fails, there may be other endpoints in the same IOMMU group that are still attached to the host. There is no way to retrieve group information using virt-manager, but virsh commands can be used to analyze the bounds of the IOMMU group and if necessary sequester devices.
See the Note in Section 16.1.1, “Assigning a PCI Device with virsh” for more information on IOMMU groups and how to detach endpoint devices using virsh.

16.1.3. PCI Device Assignment with virt-install

It is possible to assign a PCI device when installing a guest using the virt-install command. To do this, use the --host-device parameter.

Procedure 16.5. Assigning a PCI device to a virtual machine with virt-install

  1. Identify the device

    Identify the PCI device designated for device assignment to the guest virtual machine.
    # lspci | grep Ethernet
    00:19.0 Ethernet controller: Intel Corporation 82567LM-2 Gigabit Network Connection
    01:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
    01:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
    The virsh nodedev-list command lists all devices attached to the system, and identifies each PCI device with a string. To limit output to only PCI devices, enter the following command:
    # virsh nodedev-list --cap pci
    pci_0000_00_00_0
    pci_0000_00_01_0
    pci_0000_00_03_0
    pci_0000_00_07_0
    pci_0000_00_10_0
    pci_0000_00_10_1
    pci_0000_00_14_0
    pci_0000_00_14_1
    pci_0000_00_14_2
    pci_0000_00_14_3
    pci_0000_00_19_0
    pci_0000_00_1a_0
    pci_0000_00_1a_1
    pci_0000_00_1a_2
    pci_0000_00_1a_7
    pci_0000_00_1b_0
    pci_0000_00_1c_0
    pci_0000_00_1c_1
    pci_0000_00_1c_4
    pci_0000_00_1d_0
    pci_0000_00_1d_1
    pci_0000_00_1d_2
    pci_0000_00_1d_7
    pci_0000_00_1e_0
    pci_0000_00_1f_0
    pci_0000_00_1f_2
    pci_0000_00_1f_3
    pci_0000_01_00_0
    pci_0000_01_00_1
    pci_0000_02_00_0
    pci_0000_02_00_1
    pci_0000_06_00_0
    pci_0000_07_02_0
    pci_0000_07_03_0
    Record the PCI device number; the number is needed in other steps.
    Information on the domain, bus and function are available from output of the virsh nodedev-dumpxml command:
    # virsh nodedev-dumpxml pci_0000_01_00_0
    
    <device>
      <name>pci_0000_01_00_0</name>
      <parent>pci_0000_00_01_0</parent>
      <driver>
        <name>igb</name>
      </driver>
      <capability type='pci'>
        <domain>0</domain>
        <bus>1</bus>
        <slot>0</slot>
        <function>0</function>
        <product id='0x10c9'>82576 Gigabit Network Connection</product>
        <vendor id='0x8086'>Intel Corporation</vendor>
        <iommuGroup number='7'>
          <address domain='0x0000' bus='0x00' slot='0x19' function='0x0'/>
        </iommuGroup>
      </capability>
    </device>
    

    Figure 16.6. PCI device file contents

    Note

    If there are multiple endpoints in the IOMMU group and not all of them are assigned to the guest, you will need to manually detach the other endpoint(s) from the host by running the following command before you start the guest:
    $ virsh nodedev-detach pci_0000_00_19_1
    See the Note in Section 16.1.1, “Assigning a PCI Device with virsh” for more information on IOMMU groups.
  2. Add the device

    Use the PCI identifier output from the virsh nodedev command as the value for the --host-device parameter.
    virt-install \
    --name=guest1-rhel7-64 \
    --disk path=/var/lib/libvirt/images/guest1-rhel7-64.img,size=8 \
    --vcpus=2 --ram=2048 \
    --location=http://example1.com/installation_tree/RHEL7.0-Server-x86_64/os \
    --nonetworks \
    --os-type=linux \
    --os-variant=rhel7
    --host-device=pci_0000_01_00_0
  3. Complete the installation

    Complete the guest installation. The PCI device should be attached to the guest.

16.1.4. Detaching an Assigned PCI Device

When a host PCI device has been assigned to a guest machine, the host can no longer use the device. If the PCI device is in managed mode (configured using the managed='yes' parameter in the domain XML file), it attaches to the guest machine and detaches from the guest machine and re-attaches to the host machine as necessary. If the PCI device is not in managed mode, you can detach the PCI device from the guest machine and re-attach it using virsh or virt-manager.

Procedure 16.6. Detaching a PCI device from a guest with virsh

  1. Detach the device

    Use the following command to detach the PCI device from the guest by removing it in the guest's XML file:
    # virsh detach-device name_of_guest file.xml
  2. Re-attach the device to the host (optional)

    If the device is in managed mode, skip this step. The device will be returned to the host automatically.
    If the device is not using managed mode, use the following command to re-attach the PCI device to the host machine:
    # virsh nodedev-reattach device
    For example, to re-attach the pci_0000_01_00_0 device to the host:
    # virsh nodedev-reattach pci_0000_01_00_0
    The device is now available for host use.

Procedure 16.7. Detaching a PCI Device from a guest with virt-manager

  1. Open the virtual hardware details screen

    In virt-manager, double-click the virtual machine that contains the device. Select the Show virtual hardware details button to display a list of virtual hardware.
    The Show virtual hardware details button.

    Figure 16.7. The virtual hardware details button

  2. Select and remove the device

    Select the PCI device to be detached from the list of virtual devices in the left panel.
    The PCI device details and the Remove button.

    Figure 16.8. Selecting the PCI device to be detached

    Click the Remove button to confirm. The device is now available for host use.

16.1.5. PCI Bridges

Peripheral Component Interconnects (PCI) bridges are used to attach to devices such as network cards, modems and sound cards. Just like their physical counterparts, virtual devices can also be attached to a PCI Bridge. In the past, only 31 PCI devices could be added to any guest virtual machine. Now, when a 31st PCI device is added, a PCI bridge is automatically placed in the 31st slot, moving the additional PCI device to the PCI bridge. Each PCI bridge has 31 slots for 31 additional devices, all of which can be bridges. In this manner, over 900 devices can be available for guest virtual machines.
For an example of an XML configuration for PCI bridges, see Domain XML example for PCI Bridge. Note that this configuration is set up automatically, and it is not recommended to adjust manually.

16.1.6. PCI Device Assignment Restrictions

PCI device assignment (attaching PCI devices to virtual machines) requires host systems to have AMD IOMMU or Intel VT-d support to enable device assignment of PCIe devices.
Red Hat Enterprise Linux 7 has limited PCI configuration space access by guest device drivers. This limitation could cause drivers that are dependent on device capabilities or features present in the extended PCI configuration space, to fail configuration.
There is a limit of 32 total assigned devices per Red Hat Enterprise Linux 7 virtual machine. This translates to 32 total PCI functions, regardless of the number of PCI bridges present in the virtual machine or how those functions are combined to create multi-function slots.
Platform support for interrupt remapping is required to fully isolate a guest with assigned devices from the host. Without such support, the host may be vulnerable to interrupt injection attacks from a malicious guest. In an environment where guests are trusted, the administrator may opt-in to still allow PCI device assignment using the allow_unsafe_interrupts option to the vfio_iommu_type1 module. This may either be done persistently by adding a .conf file (for example local.conf) to /etc/modprobe.d containing the following:
options vfio_iommu_type1 allow_unsafe_interrupts=1
or dynamically using the sysfs entry to do the same:
# echo 1 > /sys/module/vfio_iommu_type1/parameters/allow_unsafe_interrupts

16.2. PCI Device Assignment with SR-IOV Devices

A PCI network device (specified in the domain XML by the <source> element) can be directly connected to the guest using direct device assignment (sometimes referred to as passthrough). Due to limitations in standard single-port PCI ethernet card driver design, only Single Root I/O Virtualization (SR-IOV) virtual function (VF) devices can be assigned in this manner; to assign a standard single-port PCI or PCIe Ethernet card to a guest, use the traditional <hostdev> device definition.

     <devices>
    <interface type='hostdev'>
      <driver name='vfio'/>
      <source>
        <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
      </source>
      <mac address='52:54:00:6d:90:02'>
      <virtualport type='802.1Qbh'>
        <parameters profileid='finance'/>
      </virtualport>
    </interface>
  </devices>

Figure 16.9. XML example for PCI device assignment

Developed by the PCI-SIG (PCI Special Interest Group), the Single Root I/O Virtualization (SR-IOV) specification is a standard for a type of PCI device assignment that can share a single device to multiple virtual machines. SR-IOV improves device performance for virtual machines.
How SR-IOV works

Figure 16.10. How SR-IOV works

SR-IOV enables a Single Root Function (for example, a single Ethernet port), to appear as multiple, separate, physical devices. A physical device with SR-IOV capabilities can be configured to appear in the PCI configuration space as multiple functions. Each device has its own configuration space complete with Base Address Registers (BARs).
SR-IOV uses two PCI functions:
  • Physical Functions (PFs) are full PCIe devices that include the SR-IOV capabilities. Physical Functions are discovered, managed, and configured as normal PCI devices. Physical Functions configure and manage the SR-IOV functionality by assigning Virtual Functions.
  • Virtual Functions (VFs) are simple PCIe functions that only process I/O. Each Virtual Function is derived from a Physical Function. The number of Virtual Functions a device may have is limited by the device hardware. A single Ethernet port, the Physical Device, may map to many Virtual Functions that can be shared to virtual machines.
The hypervisor can assign one or more Virtual Functions to a virtual machine. The Virtual Function's configuration space is then assigned to the configuration space presented to the guest.
Each Virtual Function can only be assigned to a single guest at a time, as Virtual Functions require real hardware resources. A virtual machine can have multiple Virtual Functions. A Virtual Function appears as a network card in the same way as a normal network card would appear to an operating system.
The SR-IOV drivers are implemented in the kernel. The core implementation is contained in the PCI subsystem, but there must also be driver support for both the Physical Function (PF) and Virtual Function (VF) devices. An SR-IOV capable device can allocate VFs from a PF. The VFs appear as PCI devices which are backed on the physical PCI device by resources such as queues and register sets.

16.2.1. Advantages of SR-IOV

SR-IOV devices can share a single physical port with multiple virtual machines.
When an SR-IOV VF is assigned to a virtual machine, it can be configured to (transparently to the virtual machine) place all network traffic leaving the VF onto a particular VLAN. The virtual machine cannot detect that its traffic is being tagged for a VLAN, and will be unable to change or eliminate this tagging.
Virtual Functions have near-native performance and provide better performance than paravirtualized drivers and emulated access. Virtual Functions provide data protection between virtual machines on the same physical server as the data is managed and controlled by the hardware.
These features allow for increased virtual machine density on hosts within a data center.
SR-IOV is better able to utilize the bandwidth of devices with multiple guests.

16.2.2. Using SR-IOV

This section covers the use of PCI passthrough to assign a Virtual Function of an SR-IOV capable multiport network card to a virtual machine as a network device.
SR-IOV Virtual Functions (VFs) can be assigned to virtual machines by adding a device entry in <hostdev> with the virsh edit or virsh attach-device command. However, this can be problematic because unlike a regular network device, an SR-IOV VF network device does not have a permanent unique MAC address, and is assigned a new MAC address each time the host is rebooted. Because of this, even if the guest is assigned the same VF after a reboot, when the host is rebooted the guest determines its network adapter to have a new MAC address. As a result, the guest believes there is new hardware connected each time, and will usually require re-configuration of the guest's network settings.
libvirt contains the <interface type='hostdev'> interface device. Using this interface device, libvirt will first perform any network-specific hardware/switch initialization indicated (such as setting the MAC address, VLAN tag, or 802.1Qbh virtualport parameters), then perform the PCI device assignment to the guest.
Using the <interface type='hostdev'> interface device requires:
  • an SR-IOV-capable network card,
  • host hardware that supports either the Intel VT-d or the AMD IOMMU extensions
  • the PCI address of the VF to be assigned.

Important

Assignment of an SR-IOV device to a virtual machine requires that the host hardware supports the Intel VT-d or the AMD IOMMU specification.
To attach an SR-IOV network device on an Intel or an AMD system, follow this procedure:

Procedure 16.8. Attach an SR-IOV network device on an Intel or AMD system

  1. Enable Intel VT-d or the AMD IOMMU specifications in the BIOS and kernel

    On an Intel system, enable Intel VT-d in the BIOS if it is not enabled already. See Procedure 16.1, “Preparing an Intel system for PCI device assignment” for procedural help on enabling Intel VT-d in the BIOS and kernel.
    Skip this step if Intel VT-d is already enabled and working.
    On an AMD system, enable the AMD IOMMU specifications in the BIOS if they are not enabled already. See Procedure 16.2, “Preparing an AMD system for PCI device assignment” for procedural help on enabling IOMMU in the BIOS.
  2. Verify support

    Verify if the PCI device with SR-IOV capabilities is detected. This example lists an Intel 82576 network interface card which supports SR-IOV. Use the lspci command to verify whether the device was detected.
    # lspci
    03:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
    03:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
    Note that the output has been modified to remove all other devices.
  3. Activate Virtual Functions

    Run the following command:
    # echo ${num_vfs} > /sys/class/net/enp14s0f0/device/sriov_numvfs
  4. Make the Virtual Functions persistent

    To make the Virtual Functions persistent across reboots, use the editor of your choice to create an udev rule similar to the following, where you specify the intended number of VFs (in this example, 2), up to the limit supported by the network interface card. In the following example, replace enp14s0f0 with the PF network device name(s) and adjust the value of ENV{ID_NET_DRIVER} to match the driver in use:
    # vim /etc/udev/rules.d/enp14s0f0.rules
    ACTION=="add", SUBSYSTEM=="net", ENV{ID_NET_DRIVER}=="ixgbe",
    ATTR{device/sriov_numvfs}="2"
    
    This will ensure the feature is enabled at boot-time.
  5. Inspect the new Virtual Functions

    Using the lspci command, list the newly added Virtual Functions attached to the Intel 82576 network device. (Alternatively, use grep to search for Virtual Function, to search for devices that support Virtual Functions.)
    # lspci | grep 82576
    0b:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
    0b:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
    0b:10.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
    0b:10.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
    0b:10.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
    0b:10.3 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
    0b:10.4 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
    0b:10.5 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
    0b:10.6 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
    0b:10.7 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
    0b:11.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
    0b:11.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
    0b:11.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
    0b:11.3 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
    0b:11.4 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
    0b:11.5 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
    The identifier for the PCI device is found with the -n parameter of the lspci command. The Physical Functions correspond to 0b:00.0 and 0b:00.1. All Virtual Functions have Virtual Function in the description.
  6. Verify devices exist with virsh

    The libvirt service must recognize the device before adding a device to a virtual machine. libvirt uses a similar notation to the lspci output. All punctuation characters, : and ., in lspci output are changed to underscores (_).
    Use the virsh nodedev-list command and the grep command to filter the Intel 82576 network device from the list of available host devices. 0b is the filter for the Intel 82576 network devices in this example. This may vary for your system and may result in additional devices.
    # virsh nodedev-list | grep 0b
    pci_0000_0b_00_0
    pci_0000_0b_00_1
    pci_0000_0b_10_0
    pci_0000_0b_10_1
    pci_0000_0b_10_2
    pci_0000_0b_10_3
    pci_0000_0b_10_4
    pci_0000_0b_10_5
    pci_0000_0b_10_6
    pci_0000_0b_11_7
    pci_0000_0b_11_1
    pci_0000_0b_11_2
    pci_0000_0b_11_3
    pci_0000_0b_11_4
    pci_0000_0b_11_5
    The PCI addresses for the Virtual Functions and Physical Functions should be in the list.
  7. Get device details with virsh

    The pci_0000_0b_00_0 is one of the Physical Functions and pci_0000_0b_10_0 is the first corresponding Virtual Function for that Physical Function. Use the virsh nodedev-dumpxml command to get device details for both devices.
    # virsh nodedev-dumpxml pci_0000_03_00_0
    <device>
      <name>pci_0000_03_00_0</name>
      <path>/sys/devices/pci0000:00/0000:00:01.0/0000:03:00.0</path>
      <parent>pci_0000_00_01_0</parent>
      <driver>
        <name>igb</name>
      </driver>
      <capability type='pci'>
        <domain>0</domain>
        <bus>3</bus>
        <slot>0</slot>
        <function>0</function>
        <product id='0x10c9'>82576 Gigabit Network Connection</product>
        <vendor id='0x8086'>Intel Corporation</vendor>
        <capability type='virt_functions'>
          <address domain='0x0000' bus='0x03' slot='0x10' function='0x0'/>
          <address domain='0x0000' bus='0x03' slot='0x10' function='0x2'/>
          <address domain='0x0000' bus='0x03' slot='0x10' function='0x4'/>
          <address domain='0x0000' bus='0x03' slot='0x10' function='0x6'/>
          <address domain='0x0000' bus='0x03' slot='0x11' function='0x0'/>
          <address domain='0x0000' bus='0x03' slot='0x11' function='0x2'/>
          <address domain='0x0000' bus='0x03' slot='0x11' function='0x4'/>
        </capability>
        <iommuGroup number='14'>
          <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
          <address domain='0x0000' bus='0x03' slot='0x00' function='0x1'/>
        </iommuGroup>
      </capability>
    </device>
    # virsh nodedev-dumpxml pci_0000_03_11_5
    <device>
      <name>pci_0000_03_11_5</name>
      <path>/sys/devices/pci0000:00/0000:00:01.0/0000:03:11.5</path>
      <parent>pci_0000_00_01_0</parent>
      <driver>
        <name>igbvf</name>
      </driver>
      <capability type='pci'>
        <domain>0</domain>
        <bus>3</bus>
        <slot>17</slot>
        <function>5</function>
        <product id='0x10ca'>82576 Virtual Function</product>
        <vendor id='0x8086'>Intel Corporation</vendor>
        <capability type='phys_function'>
          <address domain='0x0000' bus='0x03' slot='0x00' function='0x1'/>
        </capability>
        <iommuGroup number='35'>
          <address domain='0x0000' bus='0x03' slot='0x11' function='0x5'/>
        </iommuGroup>
      </capability>
    </device>
    This example adds the Virtual Function pci_0000_03_10_2 to the virtual machine in Step 8. Note the bus, slot and function parameters of the Virtual Function: these are required for adding the device.
    Copy these parameters into a temporary XML file, such as /tmp/new-interface.xml for example.
       <interface type='hostdev' managed='yes'>
         <source>
           <address type='pci' domain='0x0000' bus='0x03' slot='0x10' function='0x2'/>
         </source>
       </interface>

    Note

    When the virtual machine starts, it should see a network device of the type provided by the physical adapter, with the configured MAC address. This MAC address will remain unchanged across host and guest reboots.
    The following <interface> example shows the syntax for the optional <mac address>, <virtualport>, and <vlan> elements. In practice, use either the <vlan> or <virtualport> element, not both simultaneously as shown in the example:
    ...
     <devices>
       ...
       <interface type='hostdev' managed='yes'>
         <source>
           <address type='pci' domain='0' bus='11' slot='16' function='0'/>
         </source>
         <mac address='52:54:00:6d:90:02'>
         <vlan>
            <tag id='42'/>
         </vlan>
         <virtualport type='802.1Qbh'>
           <parameters profileid='finance'/>
         </virtualport>
       </interface>
       ...
     </devices>
    If you do not specify a MAC address, one will be automatically generated. The <virtualport> element is only used when connecting to an 802.11Qbh hardware switch. The <vlan> element will transparently put the guest's device on the VLAN tagged 42.
  8. Add the Virtual Function to the virtual machine

    Add the Virtual Function to the virtual machine using the following command with the temporary file created in the previous step. This attaches the new device immediately and saves it for subsequent guest restarts.
    virsh attach-device MyGuest /tmp/new-interface.xml --live --config
    
    Specifying the --live option with virsh attach-device attaches the new device to the running guest. Using the --config option ensures the new device is available after future guest restarts.

    Note

    The --live option is only accepted when the guest is running. virsh will return an error if the --live option is used on a non-running guest.
The virtual machine detects a new network interface card. This new card is the Virtual Function of the SR-IOV device.

16.2.3. Configuring PCI Assignment with SR-IOV Devices

SR-IOV network cards provide multiple VFs that can each be individually assigned to a guest virtual machines using PCI device assignment. Once assigned, each behaves as a full physical network device. This permits many guest virtual machines to gain the performance advantage of direct PCI device assignment, while only using a single slot on the host physical machine.
These VFs can be assigned to guest virtual machines in the traditional manner using the <hostdev> element. However, SR-IOV VF network devices do not have permanent unique MAC addresses, which causes problems where the guest virtual machine's network settings need to be re-configured each time the host physical machine is rebooted. To fix this, you need to set the MAC address prior to assigning the VF to the host physical machine after every boot of the guest virtual machine. In order to assign this MAC address, as well as other options, see the following procedure:

Procedure 16.9. Configuring MAC addresses, vLAN, and virtual ports for assigning PCI devices on SR-IOV

The <hostdev> element cannot be used for function-specific items like MAC address assignment, vLAN tag ID assignment, or virtual port assignment, because the <mac>, <vlan>, and <virtualport> elements are not valid children for <hostdev>. Instead, these elements can be used with the hostdev interface type: <interface type='hostdev'>. This device type behaves as a hybrid of an <interface> and <hostdev>. Thus, before assigning the PCI device to the guest virtual machine, libvirt initializes the network-specific hardware/switch that is indicated (such as setting the MAC address, setting a vLAN tag, or associating with an 802.1Qbh switch) in the guest virtual machine's XML configuration file. For information on setting the vLAN tag, see Section 17.16, “Setting vLAN Tags”.
  1. Gather information

    In order to use <interface type='hostdev'>, you must have an SR-IOV-capable network card, host physical machine hardware that supports either the Intel VT-d or AMD IOMMU extensions, and you must know the PCI address of the VF that you wish to assign.
  2. Shut down the guest virtual machine

    Using virsh shutdown command, shut down the guest virtual machine (here named guestVM).
    # virsh shutdown guestVM
  3. Open the XML file for editing

    # virsh edit guestVM.xml
    Optional: For the XML configuration file that was created by the virsh save command, run:
    # virsh save-image-edit guestVM.xml --running 
    The configuration file, in this example guestVM.xml, opens in your default editor. For more information, see Section 20.7.5, “Editing the Guest Virtual Machine Configuration”
  4. Edit the XML file

    Update the configuration file (guestVM.xml) to have a <devices> entry similar to the following:
    
     <devices>
       ...
       <interface type='hostdev' managed='yes'>
         <source>
           <address type='pci' domain='0x0' bus='0x00' slot='0x07' function='0x0'/> <!--these values can be decimal as well-->
         </source>
         <mac address='52:54:00:6d:90:02'/>                                         <!--sets the mac address-->
         <virtualport type='802.1Qbh'>                                              <!--sets the virtual port for the 802.1Qbh switch-->
           <parameters profileid='finance'/>
         </virtualport>
         <vlan>                                                                     <!--sets the vlan tag-->
          <tag id='42'/>
         </vlan>
       </interface>
       ...
     </devices>
    
    

    Figure 16.11. Sample domain XML for hostdev interface type

    Note

    If you do not provide a MAC address, one will be automatically generated, just as with any other type of interface device. In addition, the <virtualport> element is only used if you are connecting to an 802.11Qgh hardware switch. 802.11Qbg (also known as "VEPA") switches are currently not supported.
  5. Restart the guest virtual machine

    Run the virsh start command to restart the guest virtual machine you shut down in step 2. See Section 20.6, “Starting, Resuming, and Restoring a Virtual Machine” for more information.
     # virsh start guestVM 
    When the guest virtual machine starts, it sees the network device provided to it by the physical host machine's adapter, with the configured MAC address. This MAC address remains unchanged across guest virtual machine and host physical machine reboots.

16.2.4. Setting PCI device assignment from a pool of SR-IOV virtual functions

Hard coding the PCI addresses of particular Virtual Functions (VFs) into a guest's configuration has two serious limitations:
  • The specified VF must be available any time the guest virtual machine is started. Therefore, the administrator must permanently assign each VF to a single guest virtual machine (or modify the configuration file for every guest virtual machine to specify a currently unused VF's PCI address each time every guest virtual machine is started).
  • If the guest virtual machine is moved to another host physical machine, that host physical machine must have exactly the same hardware in the same location on the PCI bus (or the guest virtual machine configuration must be modified prior to start).
It is possible to avoid both of these problems by creating a libvirt network with a device pool containing all the VFs of an SR-IOV device. Once that is done, configure the guest virtual machine to reference this network. Each time the guest is started, a single VF will be allocated from the pool and assigned to the guest virtual machine. When the guest virtual machine is stopped, the VF will be returned to the pool for use by another guest virtual machine.

Procedure 16.10. Creating a device pool

  1. Shut down the guest virtual machine

    Using virsh shutdown command, shut down the guest virtual machine, here named guestVM.
    # virsh shutdown guestVM
  2. Create a configuration file

    Using your editor of choice, create an XML file (named passthrough.xml, for example) in the /tmp directory. Make sure to replace pf dev='eth3' with the netdev name of your own SR-IOV device's Physical Function (PF).
    The following is an example network definition that will make available a pool of all VFs for the SR-IOV adapter with its PF at "eth3' on the host physical machine:
          
    <network>
       <name>passthrough</name> <!-- This is the name of the file you created -->
       <forward mode='hostdev' managed='yes'>
         <pf dev='myNetDevName'/>  <!-- Use the netdev name of your SR-IOV devices PF here -->
       </forward>
    </network>
          
    
    

    Figure 16.12. Sample network definition domain XML

  3. Load the new XML file

    Enter the following command, replacing /tmp/passthrough.xml with the name and location of your XML file you created in the previous step:
    # virsh net-define /tmp/passthrough.xml
  4. Restarting the guest

    Run the following, replacing passthrough.xml with the name of your XML file you created in the previous step:
     # virsh net-autostart passthrough # virsh net-start passthrough 
  5. Re-start the guest virtual machine

    Run the virsh start command to restart the guest virtual machine you shutdown in the first step (example uses guestVM as the guest virtual machine's domain name). See Section 20.6, “Starting, Resuming, and Restoring a Virtual Machine” for more information.
     # virsh start guestVM 
  6. Initiating passthrough for devices

    Although only a single device is shown, libvirt will automatically derive the list of all VFs associated with that PF the first time a guest virtual machine is started with an interface definition in its domain XML like the following:
             
    <interface type='network'>
       <source network='passthrough'>
    </interface>
          
    
    

    Figure 16.13. Sample domain XML for interface network definition

  7. Verification

    You can verify this by running virsh net-dumpxml passthrough command after starting the first guest that uses the network; you will get output similar to the following:
          
    <network connections='1'>
       <name>passthrough</name>
       <uuid>a6b49429-d353-d7ad-3185-4451cc786437</uuid>
       <forward mode='hostdev' managed='yes'>
         <pf dev='eth3'/>
         <address type='pci' domain='0x0000' bus='0x02' slot='0x10' function='0x1'/>
         <address type='pci' domain='0x0000' bus='0x02' slot='0x10' function='0x3'/>
         <address type='pci' domain='0x0000' bus='0x02' slot='0x10' function='0x5'/>
         <address type='pci' domain='0x0000' bus='0x02' slot='0x10' function='0x7'/>
         <address type='pci' domain='0x0000' bus='0x02' slot='0x11' function='0x1'/>
         <address type='pci' domain='0x0000' bus='0x02' slot='0x11' function='0x3'/>
         <address type='pci' domain='0x0000' bus='0x02' slot='0x11' function='0x5'/>
       </forward>
    </network>
          
    
    

    Figure 16.14. XML dump file passthrough contents

16.2.5. SR-IOV Restrictions

SR-IOV is only thoroughly tested with the following devices:
  • Intel® 82576NS Gigabit Ethernet Controller (igb driver)
  • Intel® 82576EB Gigabit Ethernet Controller (igb driver)
  • Intel® 82599ES 10 Gigabit Ethernet Controller (ixgbe driver)
  • Intel® 82599EB 10 Gigabit Ethernet Controller (ixgbe driver)
Other SR-IOV devices may work but have not been tested at the time of release

16.3. USB Devices

This section gives the commands required for handling USB devices.

16.3.1. Assigning USB Devices to Guest Virtual Machines

Most devices such as web cameras, card readers, disk drives, keyboards, mice are connected to a computer using a USB port and cable. There are two ways to pass such devices to a guest virtual machine:
  • Using USB passthrough - this requires the device to be physically connected to the host physical machine that is hosting the guest virtual machine. SPICE is not needed in this case. USB devices on the host can be passed to the guest in the command line or virt-manager. See Section 19.3.2, “Attaching USB Devices to a Guest Virtual Machine” for virt manager directions. Note that the virt-manager directions are not suitable for hot plugging or hot unplugging devices. If you want to hot plug/or hot unplug a USB device, see Procedure 20.4, “Hot plugging USB devices for use by the guest virtual machine”.
  • Using USB re-direction - USB re-direction is best used in cases where there is a host physical machine that is running in a data center. The user connects to his/her guest virtual machine from a local machine or thin client. On this local machine there is a SPICE client. The user can attach any USB device to the thin client and the SPICE client will redirect the device to the host physical machine on the data center so it can be used by the guest virtual machine that is running on the thin client. For instructions via the virt-manager see Section 19.3.3, “USB Redirection”.

16.3.2. Setting a Limit on USB Device Redirection

To filter out certain devices from redirection, pass the filter property to -device usb-redir. The filter property takes a string consisting of filter rules, the format for a rule is:
<class>:<vendor>:<product>:<version>:<allow>
Use the value -1 to designate it to accept any value for a particular field. You may use multiple rules on the same command line using | as a separator. Note that if a device matches none of the passed in rules, redirecting it will not be allowed!

Example 16.1. An example of limiting redirection with a guest virtual machine

  1. Prepare a guest virtual machine.
  2. Add the following code excerpt to the guest virtual machine's' domain XML file:
        <redirdev bus='usb' type='spicevmc'>
          <alias name='redir0'/>
          <address type='usb' bus='0' port='3'/>
        </redirdev>
        <redirfilter>
          <usbdev class='0x08' vendor='0x1234' product='0xBEEF' version='2.0' allow='yes'/>
          <usbdev class='-1' vendor='-1' product='-1' version='-1' allow='no'/>
        </redirfilter>
    
  3. Start the guest virtual machine and confirm the setting changes by running the following:
    #ps -ef | grep $guest_name
    -device usb-redir,chardev=charredir0,id=redir0,/
    filter=0x08:0x1234:0xBEEF:0x0200:1|-1:-1:-1:-1:0,bus=usb.0,port=3
  4. Plug a USB device into a host physical machine, and use virt-manager to connect to the guest virtual machine.
  5. Click USB device selection in the menu, which will produce the following message: "Some USB devices are blocked by host policy". Click OK to confirm and continue.
    The filter takes effect.
  6. To make sure that the filter captures properly check the USB device vendor and product, then make the following changes in the host physical machine's domain XML to allow for USB redirection.
       <redirfilter>
          <usbdev class='0x08' vendor='0x0951' product='0x1625' version='2.0' allow='yes'/>
          <usbdev allow='no'/>
        </redirfilter>
    
  7. Restart the guest virtual machine, then use virt-viewer to connect to the guest virtual machine. The USB device will now redirect traffic to the guest virtual machine.

16.4. Configuring Device Controllers

Depending on the guest virtual machine architecture, some device buses can appear more than once, with a group of virtual devices tied to a virtual controller. Normally, libvirt can automatically infer such controllers without requiring explicit XML markup, but in some cases it is better to explicitly set a virtual controller element.

  ...
  <devices>
    <controller type='ide' index='0'/>
    <controller type='virtio-serial' index='0' ports='16' vectors='4'/>
    <controller type='virtio-serial' index='1'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/>
    </controller>
    ...
  </devices>
  ...

Figure 16.15. Domain XML example for virtual controllers

Each controller has a mandatory attribute <controller type>, which must be one of:
  • ide
  • fdc
  • scsi
  • sata
  • usb
  • ccid
  • virtio-serial
  • pci
The <controller> element has a mandatory attribute <controller index> which is the decimal integer describing in which order the bus controller is encountered (for use in controller attributes of <address> elements). When <controller type ='virtio-serial'> there are two additional optional attributes (named ports and vectors), which control how many devices can be connected through the controller.
When <controller type ='scsi'>, there is an optional attribute model model, which can have the following values:
  • auto
  • buslogic
  • ibmvscsi
  • lsilogic
  • lsisas1068
  • lsisas1078
  • virtio-scsi
  • vmpvscsi
When <controller type ='usb'>, there is an optional attribute model model, which can have the following values:
  • piix3-uhci
  • piix4-uhci
  • ehci
  • ich9-ehci1
  • ich9-uhci1
  • ich9-uhci2
  • ich9-uhci3
  • vt82c686b-uhci
  • pci-ohci
  • nec-xhci
Note that if the USB bus needs to be explicitly disabled for the guest virtual machine, <model='none'> may be used. .
For controllers that are themselves devices on a PCI or USB bus, an optional sub-element <address> can specify the exact relationship of the controller to its master bus, with semantics as shown in Section 16.5, “Setting Addresses for Devices”.
An optional sub-element <driver> can specify the driver-specific options. Currently, it only supports attribute queues, which specifies the number of queues for the controller. For best performance, it is recommended to specify a value matching the number of vCPUs.
USB companion controllers have an optional sub-element <master> to specify the exact relationship of the companion to its master controller. A companion controller is on the same bus as its master, so the companion index value should be equal.
An example XML which can be used is as follows:
   
     ...
  <devices>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <address type='pci' domain='0' bus='0' slot='4' function='7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <master startport='0'/>
      <address type='pci' domain='0' bus='0' slot='4' function='0' multifunction='on'/>
    </controller>
    ...
  </devices>
  ...
   

Figure 16.16. Domain XML example for USB controllers

PCI controllers have an optional model attribute with the following possible values:
  • pci-root
  • pcie-root
  • pci-bridge
  • dmi-to-pci-bridge
For machine types which provide an implicit PCI bus, the pci-root controller with index='0' is auto-added and required to use PCI devices. pci-root has no address. PCI bridges are auto-added if there are too many devices to fit on the one bus provided by model='pci-root', or a PCI bus number greater than zero was specified. PCI bridges can also be specified manually, but their addresses should only see PCI buses provided by already specified PCI controllers. Leaving gaps in the PCI controller indexes might lead to an invalid configuration. The following XML example can be added to the <devices> section:

  ...
  <devices>
    <controller type='pci' index='0' model='pci-root'/>
    <controller type='pci' index='1' model='pci-bridge'>
      <address type='pci' domain='0' bus='0' slot='5' function='0' multifunction='off'/>
    </controller>
  </devices>
  ...

Figure 16.17. Domain XML example for PCI bridge

For machine types which provide an implicit PCI Express (PCIe) bus (for example, the machine types based on the Q35 chipset), the pcie-root controller with index='0' is auto-added to the domain's configuration. pcie-root has also no address, but provides 31 slots (numbered 1-31) and can only be used to attach PCIe devices. In order to connect standard PCI devices on a system which has a pcie-root controller, a pci controller with model='dmi-to-pci-bridge' is automatically added. A dmi-to-pci-bridge controller plugs into a PCIe slot (as provided by pcie-root), and itself provides 31 standard PCI slots (which are not hot-pluggable). In order to have hot-pluggable PCI slots in the guest system, a pci-bridge controller will also be automatically created and connected to one of the slots of the auto-created dmi-to-pci-bridge controller; all guest devices with PCI addresses that are auto-determined by libvirt will be placed on this pci-bridge device.
   
     ...
  <devices>
    <controller type='pci' index='0' model='pcie-root'/>
    <controller type='pci' index='1' model='dmi-to-pci-bridge'>
      <address type='pci' domain='0' bus='0' slot='0xe' function='0'/>
    </controller>
    <controller type='pci' index='2' model='pci-bridge'>
      <address type='pci' domain='0' bus='1' slot='1' function='0'/>
    </controller>
  </devices>
  ...
   

Figure 16.18. Domain XML example for PCIe (PCI express)

The following XML configuration is used for USB 3.0 / xHCI emulation:
   
     ...
  <devices>
    <controller type='usb' index='3' model='nec-xhci'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x0f' function='0x0'/>
    </controller>
  </devices>
    ...

Figure 16.19. Domain XML example for USB3/xHCI devices

16.5. Setting Addresses for Devices

Many devices have an optional <address> sub-element which is used to describe where the device is placed on the virtual bus presented to the guest virtual machine. If an address (or any optional attribute within an address) is omitted on input, libvirt will generate an appropriate address; but an explicit address is required if more control over layout is required. For domain XML device examples that include an <address> element, see Figure 16.9, “XML example for PCI device assignment”.
Every address has a mandatory attribute type that describes which bus the device is on. The choice of which address to use for a given device is constrained in part by the device and the architecture of the guest virtual machine. For example, a <disk> device uses type='drive', while a <console> device would use type='pci' on i686 or x86_64 guest virtual machine architectures. Each address type has further optional attributes that control where on the bus the device will be placed as described in the table:

Table 16.1. Supported device address types

Address type Description
type='pci' PCI addresses have the following additional attributes:
  • domain (a 2-byte hex integer, not currently used by qemu)
  • bus (a hex value between 0 and 0xff, inclusive)
  • slot (a hex value between 0x0 and 0x1f, inclusive)
  • function (a value between 0 and 7, inclusive)
  • multifunction controls turning on the multifunction bit for a particular slot/function in the PCI control register By default it is set to 'off', but should be set to 'on' for function 0 of a slot that will have multiple functions used.
type='drive' Drive addresses have the following additional attributes:
  • controller (a 2-digit controller number)
  • bus (a 2-digit bus number
  • target (a 2-digit bus number)
  • unit (a 2-digit unit number on the bus)
type='virtio-serial' Each virtio-serial address has the following additional attributes:
  • controller (a 2-digit controller number)
  • bus (a 2-digit bus number)
  • slot (a 2-digit slot within the bus)
type='ccid' A CCID address, for smart-cards, has the following additional attributes:
  • bus (a 2-digit bus number)
  • slot attribute (a 2-digit slot within the bus)
type='usb' USB addresses have the following additional attributes:
  • bus (a hex value between 0 and 0xfff, inclusive)
  • port (a dotted notation of up to four octets, such as 1.2 or 2.1.3.1)
type='isa' ISA addresses have the following additional attributes:
  • iobase
  • irq

16.6. Random Number Generator Device

Random number generators are very important for operating system security. For securing virtual operating systems, Red Hat Enterprise Linux 7 includes virtio-rng, a virtual hardware random number generator device that can provide the guest with fresh entropy on request.
On the host physical machine, the hardware RNG interface creates a chardev at /dev/hwrng, which can be opened and then read to fetch entropy from the host physical machine. In co-operation with the rngd daemon, the entropy from the host physical machine can be routed to the guest virtual machine's /dev/random, which is the primary source of randomness.
Using a random number generator is particularly useful when a device such as a keyboard, mouse, and other inputs are not enough to generate sufficient entropy on the guest virtual machine. The virtual random number generator device allows the host physical machine to pass through entropy to guest virtual machine operating systems. This procedure can be performed using either the command line or the virt-manager interface. For instructions, see below. For more information about virtio-rng, see Red Hat Enterprise Linux Virtual Machines: Access to Random Numbers Made Easy.

Procedure 16.11. Implementing virtio-rng using the Virtual Machine Manager

  1. Shut down the guest virtual machine.
  2. Select the guest virtual machine and from the Edit menu, select Virtual Machine Details, to open the Details window for the specified guest virtual machine.
  3. Click the Add Hardware button.
  4. In the Add New Virtual Hardware window, select RNG to open the Random Number Generator window.
    Random Number Generator window

    Figure 16.20. Random Number Generator window

    Enter the intended parameters and click Finish when done. The parameters are explained in virtio-rng elements.

Procedure 16.12. Implementing virtio-rng using command-line tools

  1. Shut down the guest virtual machine.
  2. Using the virsh edit domain-name command, open the XML file for the intended guest virtual machine.
  3. Edit the <devices> element to include the following:
    
      ...
      <devices>
        <rng model='virtio'>
          <rate period='2000' bytes='1234'/>
          <backend model='random'>/dev/random</backend>
          <!-- OR -->
          <backend model='egd' type='udp'>
            <source mode='bind' service='1234'/>
            <source mode='connect' host='1.2.3.4' service='1234'/>
          </backend>
        </rng>
      </devices>
      ...

    Figure 16.21. Random number generator device

    The random number generator device allows the following XML attributes and elements:

    virtio-rng elements

    • <model> - The required model attribute specifies what type of RNG device is provided.
    • <backend model> - The <backend> element specifies the source of entropy to be used for the guest. The source model is configured using the model attribute. Supported source models include 'random' and 'egd' .
      • <backend model='random'> - This <backend> type expects a non-blocking character device as input. Examples of such devices are /dev/random and /dev/urandom. The file name is specified as contents of the <backend> element. When no file name is specified the hypervisor default is used.
      • <backend model='egd'> - This back end connects to a source using the EGD protocol. The source is specified as a character device. See character device host physical machine interface for more information.

16.7. Assigning GPU Devices

To assign a GPU to a guest, use one of the following method:
  • GPU PCI Device Assignment - Using this method, it is possible to remove a GPU device from the host and assign it to a single guest.
  • NVIDIA vGPU Assignment - This method makes it possible to create multiple mediated devices from a physical GPU, and assign these devices as virtual GPUs to multiple guests. This is only supported on selected NVIDIA GPUs, and only one mediated device can be assigned to a single guest.

16.7.1. GPU PCI Device Assignment

Red Hat Enterprise Linux 7 supports PCI device assignment of the following PCIe-based GPU devices as non-VGA graphics devices:
  • NVIDIA Quadro K-Series, M-Series, P-Series, and later architectures (models 2000 series or later)
  • NVIDIA Tesla K-Series, M-Series, and later architectures

Note

The number of GPUs that can be attached to a VM is limited by the maximum number of assigned PCI devices, which in RHEL 7 is currently 32. However, attaching multiple GPUs to a virtual machine is likely to cause problems with memory-mapped I/O (MMIO) on the guest, which may result in the GPUs not being available to the VM.
To work around these problems, set a larger 64-bit MMIO space and configure the vCPU physical address bits to make the extended 64-bit MMIO space addressable.
To assign a GPU to a guest virtual machine, you must enable the I/O Memory Management Unit (IOMMU) on the host machine, identify the GPU device by using the lspci command, detach the device from host, attach it to the guest, and configure Xorg on the guest - as described in the following procedures:

Procedure 16.13. Enable IOMMU support in the host machine kernel

  1. Edit the kernel command line

    For an Intel VT-d system, IOMMU is activated by adding the intel_iommu=on and iommu=pt parameters to the kernel command line. For an AMD-Vi system, the option needed is only iommu=pt. To enable this option, edit or add the GRUB_CMDLINX_LINUX line to the /etc/sysconfig/grub configuration file as follows:
    GRUB_CMDLINE_LINUX="rd.lvm.lv=vg_VolGroup00/LogVol01
    vconsole.font=latarcyrheb-sun16 rd.lvm.lv=vg_VolGroup_1/root
    vconsole.keymap=us $([ -x /usr/sbin/rhcrashkernel-param ]  &&
    /usr/sbin/rhcrashkernel-param || :) rhgb quiet intel_iommu=on iommu=pt"
    

    Note

    For further information on IOMMU, see Appendix E, Working with IOMMU Groups.
  2. Regenerate the boot loader configuration

    For the changes to the kernel command line to apply, regenerate the boot loader configuration using the grub2-mkconfig command:
    # grub2-mkconfig -o /etc/grub2.cfg
    Note that if you are using a UEFI-based host, the target file should be /etc/grub2-efi.cfg.
  3. Reboot the host

    For the changes to take effect, reboot the host machine:
    # reboot

Procedure 16.14. Excluding the GPU device from binding to the host physical machine driver

For GPU assignment, it is recommended to exclude the device from binding to host drivers, as these drivers often do not support dynamic unbinding of the device.
  1. Identify the PCI bus address

    To identify the PCI bus address and IDs of the device, run the following lspci command. In this example, a VGA controller such as an NVIDIA Quadro or GRID card is used:
    # lspci -Dnn | grep VGA
    0000:02:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK106GL [Quadro K4000] [10de:11fa] (rev a1)
    
    The resulting search reveals that the PCI bus address of this device is 0000:02:00.0 and the PCI IDs for the device are 10de:11fa.
  2. Prevent the native host machine driver from using the GPU device

    To prevent the native host machine driver from using the GPU device, you can use a PCI ID with the pci-stub driver. To do this, append the pci-stub.ids option, with the PCI IDs as its value, to the GRUB_CMDLINX_LINUX line located in the /etc/sysconfig/grub configuration file, for example as follows:
    GRUB_CMDLINE_LINUX="rd.lvm.lv=vg_VolGroup00/LogVol01
    vconsole.font=latarcyrheb-sun16 rd.lvm.lv=vg_VolGroup_1/root
    vconsole.keymap=us $([ -x /usr/sbin/rhcrashkernel-param ]  &&
    /usr/sbin/rhcrashkernel-param || :) rhgb quiet intel_iommu=on iommu=pt pci-stub.ids=10de:11fa"
    
    To add additional PCI IDs for pci-stub, separate them with a comma.
  3. Regenerate the boot loader configuration

    Regenerate the boot loader configuration using the grub2-mkconfig to include this option:
    # grub2-mkconfig -o /etc/grub2.cfg
    Note that if you are using a UEFI-based host, the target file should be /etc/grub2-efi.cfg.
  4. Reboot the host machine

    In order for the changes to take effect, reboot the host machine:
    # reboot

Procedure 16.15. Optional: Editing the GPU IOMMU configuration

Prior to attaching the GPU device, editing its IOMMU configuration may be needed for the GPU to work properly on the guest.
  1. Display the XML information of the GPU

    To display the settings of the GPU in XML form, you first need to convert its PCI bus address to libvirt-compatible format by appending pci_ and converting delimiters to underscores. In this example, the GPU PCI device identified with the 0000:02:00.0 bus address (as obtained in the previous procedure) becomes pci_0000_02_00_0. Use the libvirt address of the device with the virsh nodedev-dumpxml to display its XML configuration:
    # virsh nodedev-dumpxml pci_0000_02_00_0
    
    <device>
     <name>pci_0000_02_00_0</name>
     <path>/sys/devices/pci0000:00/0000:00:03.0/0000:02:00.0</path>
     <parent>pci_0000_00_03_0</parent>
     <driver>
      <name>pci-stub</name>
     </driver>
     <capability type='pci'>
      <domain>0</domain>
      <bus>2</bus>
      <slot>0</slot>
      <function>0</function>
      <product id='0x11fa'>GK106GL [Quadro K4000]</product>
      <vendor id='0x10de'>NVIDIA Corporation</vendor>
         <!-- pay attention to the following lines -->
      <iommuGroup number='13'>
       <address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
       <address domain='0x0000' bus='0x02' slot='0x00' function='0x1'/>
      </iommuGroup>
      <pci-express>
       <link validity='cap' port='0' speed='8' width='16'/>
       <link validity='sta' speed='2.5' width='16'/>
      </pci-express>
     </capability>
    </device>
    Note the <iommuGroup> element of the XML. The iommuGroup indicates a set of devices that are considered isolated from other devices due to IOMMU capabilities and PCI bus topologies. All of the endpoint devices within the iommuGroup (meaning devices that are not PCIe root ports, bridges, or switch ports) need to be unbound from the native host drivers in order to be assigned to a guest. In the example above, the group is composed of the GPU device (0000:02:00.0) as well as the companion audio device (0000:02:00.1). For more information, see Appendix E, Working with IOMMU Groups.
  2. Adjust IOMMU settings

    In this example, assignment of NVIDIA audio functions is not supported due to hardware issues with legacy interrupt support. In addition, the GPU audio function is generally not useful without the GPU itself. Therefore, in order to assign the GPU to a guest, the audio function must first be detached from native host drivers. This can be done using one of the following:

Procedure 16.16. Attaching the GPU

The GPU can be attached to the guest using any of the following methods:
  1. Using the Virtual Machine Manager interface. For details, see Section 16.1.2, “Assigning a PCI Device with virt-manager”.
  2. Creating an XML configuration fragment for the GPU and attaching it with the virsh attach-device:
    1. Create an XML for the device, similar to the following:
      
      <hostdev mode='subsystem' type='pci' managed='yes'>
       <driver name='vfio'/>
       <source>
        <address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
       </source>
      </hostdev>
    2. Save this to a file and run virsh attach-device [domain] [file] --persistent to include the XML in the guest configuration. Note that the assigned GPU is added in addition to the existing emulated graphics device in the guest machine. The assigned GPU is handled as a secondary graphics device in the virtual machine. Assignment as a primary graphics device is not supported and emulated graphics devices in the guest's XML should not be removed.
  3. Editing the guest XML configuration using the virsh edit command and adding the appropriate XML segment manually.

Procedure 16.17. Ḿodifying the Xorg configuration on the guest

The GPU's PCI bus address on the guest will be different than on the host. To enable the host to use the GPU properly, configure the guest's Xorg display server to use the assigned GPU address:
  1. In the guest, use the lspci command to determine the PCI bus adress of the GPU:
    # lspci | grep VGA
    00:02.0 VGA compatible controller: Device 1234:111
    00:09.0 VGA compatible controller: NVIDIA Corporation GK106GL [Quadro K4000] (rev a1)
    
    In this example, the bus address is 00:09.0.
  2. In the /etc/X11/xorg.conf file on the guest, add a BusID option with the detected address adjusted as follows:
    		Section "Device"
    		    Identifier     "Device0"
    		    Driver         "nvidia"
    		    VendorName     "NVIDIA Corporation"
    		    BusID          "PCI:0:9:0"
    		EndSection
    

    Important

    If the bus address detected in Step 1 is hexadecimal, you need to convert the values between delimiters to the decimal system. For example, 00:0a.0 should be converted into PCI:0:10:0.

Note

When using an assigned NVIDIA GPU in the guest, only the NVIDIA drivers are supported. Other drivers may not work and may generate errors. For a Red Hat Enterprise Linux 7 guest, the nouveau driver can be blacklisted using the option modprobe.blacklist=nouveau on the kernel command line during install. For information on other guest virtual machines, see the operating system's specific documentation.
Depending on the guest operating system, with the NVIDIA drivers loaded, the guest may support using both the emulated graphics and assigned graphics simultaneously or may disable the emulated graphics. Note that access to the assigned graphics framebuffer is not provided by applications such as virt-manager. If the assigned GPU is not connected to a physical display, guest-based remoting solutions may be necessary to access the GPU desktop. As with all PCI device assignment, migration of a guest with an assigned GPU is not supported and each GPU is owned exclusively by a single guest. Depending on the guest operating system, hot plug support of GPUs may be available.

16.7.2. NVIDIA vGPU Assignment

The NVIDIA vGPU feature makes it possible to divide a physical GPU device into multiple virtual devices referred to as mediated devices. These mediated devices can then be assigned to multiple guests as virtual GPUs. As a result, these guests share the performance of a single physical GPU.

Important

This feature is only available on a limited set of NVIDIA GPUs. For an up-to-date list of these devices, see the NVIDIA GPU Software Documentation.

16.7.2.1. NVIDIA vGPU Setup

To set up the vGPU feature, you first need to obtain NVIDIA vGPU drivers for your GPU device, then create mediated devices, and assign them to the intended guest machines:
  1. Obtain the NVIDIA vGPU drivers and install them on your system. For instructions, see the NVIDIA documentation.
  2. If the NVIDIA software installer did not create the /etc/modprobe.d/nvidia-installer-disable-nouveau.conf file, create a .conf file (of any name) in the /etc/modprobe.d/ directory. Add the following lines in the file:
    blacklist nouveau
    options nouveau modeset=0
    
    
  3. Regenerate the initial ramdisk for the current kernel, then reboot:
    # dracut --force
    # reboot
    If you need to use a prior supported kernel version with mediated devices, regenerate the initial ramdisk for all installed kernel versions:
    # dracut --regenerate-all --force
    # reboot
  4. Check that the nvidia_vgpu_vfio module has been loaded by the kernel and that the nvidia-vgpu-mgr.service service is running.
    # lsmod | grep nvidia_vgpu_vfio
    nvidia_vgpu_vfio 45011 0
    nvidia 14333621 10 nvidia_vgpu_vfio
    mdev 20414 2 vfio_mdev,nvidia_vgpu_vfio
    vfio 32695 3 vfio_mdev,nvidia_vgpu_vfio,vfio_iommu_type1
    # systemctl status nvidia-vgpu-mgr.service
    nvidia-vgpu-mgr.service - NVIDIA vGPU Manager Daemon
       Loaded: loaded (/usr/lib/systemd/system/nvidia-vgpu-mgr.service; enabled; vendor preset: disabled)
       Active: active (running) since Fri 2018-03-16 10:17:36 CET; 5h 8min ago
     Main PID: 1553 (nvidia-vgpu-mgr)
     [...]
    
  5. Write a device UUID to /sys/class/mdev_bus/pci_dev/mdev_supported_types/type-id/create, where pci_dev is the PCI address of the host GPU, and type-id is an ID of the host GPU type.
    The following example shows how to create a mediated device of nvidia-63 vGPU type on an NVIDIA Tesla P4 card:
    # uuidgen
    30820a6f-b1a5-4503-91ca-0c10ba58692a
    # echo "30820a6f-b1a5-4503-91ca-0c10ba58692a" > /sys/class/mdev_bus/0000:01:00.0/mdev_supported_types/nvidia-63/create
    For type-id values for specific devices, see section 1.3.1. Virtual GPU Types in Virtual GPU software documentation. Note that only Q-series NVIDIA vGPUs, such as GRID P4-2Q, are supported as mediated device GPU types on Linux guests.
  6. Add the following lines to the <devices/> sections in XML configurations of guests that you want to share the vGPU resources. Use the UUID value generated by the uuidgen command in the previous step. Each UUID can only be assigned to one guest at a time.
    
    <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci'>
      <source>
        <address uuid='30820a6f-b1a5-4503-91ca-0c10ba58692a'/>
      </source>
    </hostdev>
    

    Important

    For the vGPU mediated devices to work properly on the assigned guests, NVIDIA vGPU guest software licensing needs to be set up for the guests. For further information and instructions, see the NVIDIA virtual GPU software documentation.

16.7.2.2. Setting up and using the VNC console for video streaming with NVIDIA vGPU

As a Technology Preview, the Virtual Network Computing (VNC) console can be used with GPU-based mediated devices, including NVIDIA vGPU, in Red Hat Enterprise Linux 7.7 and later. As a result, you can use VNC to display the accelerated graphical output provided by an NVIDIA vGPU device.

Warning

This feature is currently only provided as a Technology Preview and is not supported by Red Hat. Therefore, using the procedure below in a production environment is heavily discouraged.
To configure vGPU output rendering in a VNC console on your virtual machine, do the following:
  1. Install NVIDIA vGPU drivers and configure NVIDIA vGPU on your system as described in Section 16.7.2.1, “NVIDIA vGPU Setup”. Ensure the mediated device's XML configuration includes the display='on' parameter. For example:
    			
    <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='on'>
       <source>
          <address uuid='ba26a3e2-8e1e-4f39-9de7-b26bd210268a'/>
       </source>
    </hostdev>
    			
  2. Optionally, set the VM's video model type as none. For example:
    			
    <video>
       <model type='none'/>
    </video>
    			
    If this is not specified, you receive two different display outputs - one from an emulated Cirrus or QXL card and one from NVIDIA vGPU. Also note that using model type='none' currently makes it impossible to see the boot graphical output until the drivers are initialized. As a result, the first graphical output displayed is the login screen.
  3. Ensure the XML configuration of the VM's graphics type is vnc.
    For example:
    			
    <graphics type='vnc' port='-1' autoport='yes'>
    	 <listen type='address'/>
    </graphics>
    			
  4. Start the virtual machine.
  5. Connect to the virtual machine using the VNC viewer remote desktop client.

    Note

    If the VM is set up with an emulated VGA as the primary video device and vGPU as the secondary device, use the ctrl+alt+2 keyboard shortcut to switch to the vGPU display.

16.7.2.3. Removing NVIDIA vGPU Devices

To remove a mediated vGPU device, use the following command when the device is inactive, and replace uuid with the UUID of the device, for example 30820a6f-b1a5-4503-91ca-0c10ba58692a.
# echo 1 > /sys/bus/mdev/devices/uuid/remove
Note that attempting to remove a vGPU device that is currently in use by a guest triggers the following error:
echo: write error: Device or resource busy

16.7.2.4. Querying NVIDIA vGPU Capabilities

To obtain additional information about the mediated devices on your system, such as how many mediated devices of a given type can be created, use the virsh nodedev-list --cap mdev_types and virsh nodedev-dumpxml commands. For example, the following displays available vGPU types on a Tesla P4 card:

$ virsh nodedev-list --cap mdev_types
pci_0000_01_00_0
$ virsh nodedev-dumpxml pci_0000_01_00_0
<...>
  <capability type='mdev_types'>
    <type id='nvidia-70'>
      <name>GRID P4-8A</name>
      <deviceAPI>vfio-pci</deviceAPI>
      <availableInstances>1</availableInstances>
    </type>
    <type id='nvidia-69'>
      <name>GRID P4-4A</name>
      <deviceAPI>vfio-pci</deviceAPI>
      <availableInstances>2</availableInstances>
    </type>
    <type id='nvidia-67'>
      <name>GRID P4-1A</name>
      <deviceAPI>vfio-pci</deviceAPI>
      <availableInstances>8</availableInstances>
    </type>
    <type id='nvidia-65'>
      <name>GRID P4-4Q</name>
      <deviceAPI>vfio-pci</deviceAPI>
      <availableInstances>2</availableInstances>
    </type>
    <type id='nvidia-63'>
      <name>GRID P4-1Q</name>
      <deviceAPI>vfio-pci</deviceAPI>
      <availableInstances>8</availableInstances>
    </type>
    <type id='nvidia-71'>
      <name>GRID P4-1B</name>
      <deviceAPI>vfio-pci</deviceAPI>
      <availableInstances>8</availableInstances>
    </type>
    <type id='nvidia-68'>
      <name>GRID P4-2A</name>
      <deviceAPI>vfio-pci</deviceAPI>
      <availableInstances>4</availableInstances>
    </type>
    <type id='nvidia-66'>
      <name>GRID P4-8Q</name>
      <deviceAPI>vfio-pci</deviceAPI>
      <availableInstances>1</availableInstances>
    </type>
    <type id='nvidia-64'>
      <name>GRID P4-2Q</name>
      <deviceAPI>vfio-pci</deviceAPI>
      <availableInstances>4</availableInstances>
    </type>
  </capability>
</...>

16.7.2.5. Remote Desktop Streaming Services for NVIDIA vGPU

The following remote desktop streaming services have been successfully tested for use with the NVIDIA vGPU feature on Red Hat Enterprise Linux 7:
  • HP-RGS
  • Mechdyne TGX - It is currently not possible to use Mechdyne TGX with Windows Server 2016 guests.
  • NICE DCV - When using this streaming service, Red Hat recommends using fixed resolution settings, as using dynamic resolution in some cases results in a black screen.

16.7.2.6. Setting up the VNC console for video streaming with NVIDIA vGPU

Introduction

As a Technology Preview, the Virtual Network Computing (VNC) console can be used with GPU-based mediated devices, including NVIDIA vGPU, in Red Hat Enterprise Linux 8. As a result, you can use VNC to display the accelerated graphical output provided by an NVIDIA vGPU device.

Important

Due to being a Technology Preview, this feature is not supported by Red Hat. Therefore, using the procedure below in a production environment is heavily discouraged.

Configuration

To configure vGPU output rendering in a VNC console on your virtual machine, do the following:
  1. Install NVIDIA vGPU drivers and configure NVIDIA vGPU on your host as described in Section 16.7.2, “NVIDIA vGPU Assignment”. Ensure the mediated device's XML configuration includes the display='on' parameter. For example:
    <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='on'>
     <source>
        <address uuid='ba26a3e2-8e1e-4f39-9de7-b26bd210268a'/>
     </source>
    </hostdev>
  2. Optionally, set the VM's video model type as none. For example:
    <video>
     <model type='none'/>
    </video>
  3. Ensure the XML configuration of the VM's graphics type is spice or vnc.
    An example for spice:
    <graphics type='spice' autoport='yes'>
     <listen type='address'/>
     <image compression='off'/>
    </graphics>
    An example for vnc:
    <graphics type='vnc' port='-1' autoport='yes'>
     <listen type='address'/>
    </graphics>
  4. Start the virtual machine.
  5. Connect to the virtual machine using a client appropriate to the graphics protocol you configured in the previous steps.
    • For VNC, use the VNC viewer remote desktop client. If the VM is set up with an emulated VGA as the primary video device and vGPU as the secondary, use the ctrl+alt+2 keyboard shortcut to switch to the vGPU display.
    • For SPICE, use the virt-viewer application.

Chapter 17. Virtual Networking

This chapter introduces the concepts needed to create, start, stop, remove, and modify virtual networks with libvirt.
Additional information can be found in the libvirt reference chapter

17.1. Virtual Network Switches

Libvirt virtual networking uses the concept of a virtual network switch. A virtual network switch is a software construct that operates on a host physical machine server, to which virtual machines (guests) connect. The network traffic for a guest is directed through this switch:
Virtual network switch with two guests

Figure 17.1. Virtual network switch with two guests

Linux host physical machine servers represent a virtual network switch as a network interface. When the libvirtd daemon (libvirtd) is first installed and started, the default network interface representing the virtual network switch is virbr0.
This virbr0 interface can be viewed with the ip command like any other interface:
 $ ip addr show virbr0
 3: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
     link/ether 1b:c4:94:cf:fd:17 brd ff:ff:ff:ff:ff:ff
     inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0

17.2. Bridged Mode

When using Bridged mode, all of the guest virtual machines appear within the same subnet as the host physical machine. All other physical machines on the same physical network are aware of the virtual machines, and can access the virtual machines. Bridging operates on Layer 2 of the OSI networking model.
Virtual network switch in bridged mode

Figure 17.2. Virtual network switch in bridged mode

It is possible to use multiple physical interfaces on the hypervisor by joining them together with a bond. The bond is then added to a bridge and then guest virtual machines are added onto the bridge as well. However, the bonding driver has several modes of operation, and only a few of these modes work with a bridge where virtual guest machines are in use.

Warning

When using bridged mode, the only bonding modes that should be used with a guest virtual machine are Mode 1, Mode 2, and Mode 4. Using modes 0, 3, 5, or 6 is likely to cause the connection to fail. Also note that Media-Independent Interface (MII) monitoring should be used to monitor bonding modes, as Address Resolution Protocol (ARP) monitoring does not work.
For more information on bonding modes, see related Knowledgebase article, or the Red Hat Enterprise Linux 7 Networking Guide.
For a detailed explanation of bridge_opts parameters, used to configure bridged networking mode, see the Red Hat Virtualization Administration Guide.

17.3. Network Address Translation

By default, virtual network switches operate in NAT mode. They use IP masquerading rather than Source-NAT (SNAT) or Destination-NAT (DNAT). IP masquerading enables connected guests to use the host physical machine IP address for communication to any external network. By default, computers that are placed externally to the host physical machine cannot communicate to the guests inside when the virtual network switch is operating in NAT mode, as shown in the following diagram:
Virtual network switch using NAT with two guests

Figure 17.3. Virtual network switch using NAT with two guests

Warning

Virtual network switches use NAT configured by iptables rules. Editing these rules while the switch is running is not recommended, as incorrect rules may result in the switch being unable to communicate.
If the switch is not running, you can set the public IP range for forward mode NAT in order to create a port masquerading range by running:
# iptables -j SNAT --to-source [start]-[end]

17.4. DNS and DHCP

IP information can be assigned to guests via DHCP. A pool of addresses can be assigned to a virtual network switch for this purpose. Libvirt uses the dnsmasq program for this. An instance of dnsmasq is automatically configured and started by libvirt for each virtual network switch that needs it.
Virtual network switch running dnsmasq

Figure 17.4. Virtual network switch running dnsmasq

17.5. Routed Mode

When using Routed mode, the virtual switch connects to the physical LAN connected to the host physical machine, passing traffic back and forth without the use of NAT. The virtual switch can examine all traffic and use the information contained within the network packets to make routing decisions. When using this mode, all of the virtual machines are in their own subnet, routed through a virtual switch. This situation is not always ideal as no other host physical machines on the physical network are aware of the virtual machines without manual physical router configuration, and cannot access the virtual machines. Routed mode operates at Layer 3 of the OSI networking model.
Virtual network switch in routed mode

Figure 17.5. Virtual network switch in routed mode

17.6. Isolated Mode

When using Isolated mode, guests connected to the virtual switch can communicate with each other, and with the host physical machine, but their traffic will not pass outside of the host physical machine, and they cannot receive traffic from outside the host physical machine. Using dnsmasq in this mode is required for basic functionality such as DHCP. However, even if this network is isolated from any physical network, DNS names are still resolved. Therefore, a situation can arise when DNS names resolve but ICMP echo request (ping) commands fail.
Virtual network switch in isolated mode

Figure 17.6. Virtual network switch in isolated mode

17.7. The Default Configuration

When the libvirtd daemon (libvirtd) is first installed, it contains an initial virtual network switch configuration in NAT mode. This configuration is used so that installed guests can communicate to the external network, through the host physical machine. The following image demonstrates this default configuration for libvirtd:
Default libvirt network configuration

Figure 17.7. Default libvirt network configuration

Note

A virtual network can be restricted to a specific physical interface. This may be useful on a physical system that has several interfaces (for example, eth0, eth1 and eth2). This is only useful in routed and NAT modes, and can be defined in the dev=<interface> option, or in virt-manager when creating a new virtual network.

17.8. Examples of Common Scenarios

This section demonstrates different virtual networking modes and provides some example scenarios.

17.8.1. Bridged Mode

Bridged mode operates on Layer 2 of the OSI model. When used, all of the guest virtual machines will appear on the same subnet as the host physical machine. The most common use cases for bridged mode include:
  • Deploying guest virtual machines in an existing network alongside host physical machines making the difference between virtual and physical machines transparent to the end user.
  • Deploying guest virtual machines without making any changes to existing physical network configuration settings.
  • Deploying guest virtual machines which must be easily accessible to an existing physical network. Placing guest virtual machines on a physical network where they must access services within an existing broadcast domain, such as DHCP.
  • Connecting guest virtual machines to an exsting network where VLANs are used.

17.8.2. Routed Mode

DMZ

Consider a network where one or more nodes are placed in a controlled sub-network for security reasons. The deployment of a special sub-network such as this is a common practice, and the sub-network is known as a DMZ. See the following diagram for more details on this layout:

Sample DMZ configuration

Figure 17.8. Sample DMZ configuration

Host physical machines in a DMZ typically provide services to WAN (external) host physical machines as well as LAN (internal) host physical machines. As this requires them to be accessible from multiple locations, and considering that these locations are controlled and operated in different ways based on their security and trust level, routed mode is the best configuration for this environment.
Virtual Server Hosting

Consider a virtual server hosting company that has several host physical machines, each with two physical network connections. One interface is used for management and accounting, the other is for the virtual machines to connect through. Each guest has its own public IP address, but the host physical machines use private IP address as management of the guests can only be performed by internal administrators. See the following diagram to understand this scenario:

Virtual server hosting sample configuration

Figure 17.9. Virtual server hosting sample configuration

17.8.3. NAT Mode

NAT (Network Address Translation) mode is the default mode. It can be used for testing when there is no need for direct network visibility.

17.8.4. Isolated Mode

Isolated mode allows virtual machines to communicate with each other only. They are unable to interact with the physical network.

17.9. Managing a Virtual Network

To configure a virtual network on your system:
  1. From the Edit menu, select Connection Details.
  2. This will open the Connection Details menu. Click the Virtual Networks tab.
    Virtual network configuration

    Figure 17.10. Virtual network configuration

  3. All available virtual networks are listed on the left of the menu. You can edit the configuration of a virtual network by selecting it from this box and editing as you see fit.

17.10. Creating a Virtual Network

To create a virtual network on your system using the Virtual Machine Manager (virt-manager):
  1. Open the Virtual Networks tab from within the Connection Details menu. Click the Add Network button, identified by a plus sign (+) icon. For more information, see Section 17.9, “Managing a Virtual Network”.
    Virtual network configuration

    Figure 17.11. Virtual network configuration

    This will open the Create a new virtual network window. Click Forward to continue.
    Naming your new virtual network

    Figure 17.12. Naming your new virtual network

  2. Enter an appropriate name for your virtual network and click Forward.
    Choosing an IPv4 address space

    Figure 17.13. Choosing an IPv4 address space

  3. Check the Enable IPv4 network address space definition check box.
    Enter an IPv4 address space for your virtual network in the Network field.
    Check the Enable DHCPv4 check box.
    Define the DHCP range for your virtual network by specifying a Start and End range of IP addresses.
    Choosing an IPv4 address space

    Figure 17.14. Choosing an IPv4 address space

    Click Forward to continue.
  4. If you want to enable IPv6, check the Enable IPv6 network address space definition.
    Enabling IPv6

    Figure 17.15. Enabling IPv6

    Additional fields appear in the Create a new virtual network window.
    Configuring IPv6

    Figure 17.16. Configuring IPv6

    Enter an IPv6 address in the Network field.
  5. If you want to enable DHCPv6, check the Enable DHCPv6 check box.
    Additional fields appear in the Create a new virtual network window.
    Configuring DHCPv6

    Figure 17.17. Configuring DHCPv6

    (Optional) Edit the start and end of the DHCPv6 range.
  6. If you want to enable static route definitions, check the Enable Static Route Definition check box.
    Additional fields appear in the Create a new virtual network window.
    Defining static routes

    Figure 17.18. Defining static routes

    Enter a network address and the gateway that will be used for the route to the network in the appropriate fields.
    Click Forward.
  7. Select how the virtual network should connect to the physical network.
    Connecting to the physical network

    Figure 17.19. Connecting to the physical network

    If you want the virtual network to be isolated, ensure that the Isolated virtual network radio button is selected.
    If you want the virtual network to connect to a physical network, select Forwarding to physical network, and choose whether the Destination should be Any physical device or a specific physical device. Also select whether the Mode should be NAT or Routed.
    If you want to enable IPv6 routing within the virtual network, check the Enable IPv6 internal routing/networking check box.
    Enter a DNS domain name for the virtual network.
    Click Finish to create the virtual network.
  8. The new virtual network is now available in the Virtual Networks tab of the Connection Details window.

17.11. Attaching a Virtual Network to a Guest

To attach a virtual network to a guest:
  1. In the Virtual Machine Manager window, highlight the guest that will have the network assigned.
    Selecting a virtual machine to display

    Figure 17.20. Selecting a virtual machine to display

  2. From the Virtual Machine Manager Edit menu, select Virtual Machine Details.
  3. Click the Add Hardware button on the Virtual Machine Details window.
  4. In the Add new virtual hardware window, select Network from the left pane, and select your network name (network1 in this example) from the Network source menu. Modify the MAC address, if necessary, and select a Device model. Click Finish.
    Select your network from the Add new virtual hardware window

    Figure 17.21. Select your network from the Add new virtual hardware window

  5. The new network is now displayed as a virtual network interface that will be presented to the guest upon launch.
    New network shown in guest hardware list

    Figure 17.22. New network shown in guest hardware list

17.12. Attaching a Virtual NIC Directly to a Physical Interface

As an alternative to the default NAT connection, you can use the macvtap driver to attach the guest's NIC directly to a specified physical interface of the host machine. This is not to be confused with device assignment (also known as passthrough). Macvtap connection has the following modes, each with different benefits and usecases:

Physical interface delivery modes

VEPA
In virtual ethernet port aggregator (VEPA) mode, all packets from the guests are sent to the external switch. This enables the user to force guest traffic through the switch. For VEPA mode to work correctly, the external switch must also support hairpin mode, which ensures that packets whose destination is a guest on the same host machine as their source guest are sent back to the host by the external switch.
VEPA mode

Figure 17.23. VEPA mode

bridge
Packets whose destination is on the same host machine as their source guest are directly delivered to the target macvtap device. Both the source device and the destination device need to be in bridge mode for direct delivery to succeed. If either one of the devices is in VEPA mode, a hairpin-capable external switch is required.
Bridge mode

Figure 17.24. Bridge mode

private
All packets are sent to the external switch and will only be delivered to a target guest on the same host machine if they are sent through an external router or gateway and these send them back to the host. Private mode can be used to prevent the individual guests on the single host from communicating with each other. This procedure is followed if either the source or destination device is in private mode.
Private mode

Figure 17.25. Private mode

passthrough
This feature attaches a physical interface device or a SR-IOV Virtual Function (VF) directly to a guest without losing the migration capability. All packets are sent directly to the designated network device. Note that a single network device can only be passed through to a single guest, as a network device cannot be shared between guests in passthrough mode.
Passthrough mode

Figure 17.26. Passthrough mode

Macvtap can be configured by changing the domain XML file or by using the virt-manager interface.

17.12.1. Configuring macvtap using domain XML

Open the domain XML file of the guest and modify the <devices> element as follows:
<devices>
	...
	<interface type='direct'>
		<source dev='eth0' mode='vepa'/>
	</interface>
</devices>
The network access of direct attached guest virtual machines can be managed by the hardware switch to which the physical interface of the host physical machine is connected.
The interface can have additional parameters as shown below, if the switch is conforming to the IEEE 802.1Qbg standard. The parameters of the virtualport element are documented in more detail in the IEEE 802.1Qbg standard. The values are network specific and should be provided by the network administrator. In 802.1Qbg terms, the Virtual Station Interface (VSI) represents the virtual interface of a virtual machine. Also note that IEEE 802.1Qbg requires a non-zero value for the VLAN ID.

Virtual Station Interface types

managerid
The VSI Manager ID identifies the database containing the VSI type and instance definitions. This is an integer value and the value 0 is reserved.
typeid
The VSI Type ID identifies a VSI type characterizing the network access. VSI types are typically managed by network administrator. This is an integer value.
typeidversion
The VSI Type Version allows multiple versions of a VSI Type. This is an integer value.
instanceid
The VSI Instance ID is generated when a VSI instance (a virtual interface of a virtual machine) is created. This is a globally unique identifier.
profileid
The profile ID contains the name of the port profile that is to be applied onto this interface. This name is resolved by the port profile database into the network parameters from the port profile, and those network parameters will be applied to this interface.
Each of the four types is configured by changing the domain XML file. Once this file is opened, change the mode setting as shown:
<devices>
 ...
 <interface type='direct'>
  <source dev='eth0.2' mode='vepa'/>
   <virtualport type="802.1Qbg">
    <parameters managerid="11" typeid="1193047" typeidversion="2" instanceid="09b11c53-8b5c-4eeb-8f00-d84eaa0aaa4f"/>
   </virtualport>
  </interface>
</devices>
The profile ID is shown here:
<devices>
 ...
 <interface type='direct'>
  <source dev='eth0' mode='private'/>
   <virtualport type='802.1Qbh'>
    <parameters profileid='finance'/>
   </virtualport>
 </interface>
</devices>
...

17.12.2. Configuring macvtap using virt-manager

Open the virtual hardware details window ⇒ select NIC in the menu ⇒ for Network source, select host device name: macvtap ⇒ select the intended Source mode.
The virtual station interface types can then be set up in the Virtual port submenu.
Configuring macvtap in virt-manager

Figure 17.27. Configuring macvtap in virt-manager

17.13. Dynamically Changing a Host Physical Machine or a Network Bridge that is Attached to a Virtual NIC

This section demonstrates how to move the vNIC of a guest virtual machine from one bridge to another while the guest virtual machine is running without compromising the guest virtual machine
  1. Prepare guest virtual machine with a configuration similar to the following:
    <interface type='bridge'>
          <mac address='52:54:00:4a:c9:5e'/>
          <source bridge='virbr0'/>
          <model type='virtio'/>
    </interface>
    
  2. Prepare an XML file for interface update:
    # cat br1.xml
    <interface type='bridge'>
          <mac address='52:54:00:4a:c9:5e'/>
          <source bridge='virbr1'/>
          <model type='virtio'/>
    </interface>
    
  3. Start the guest virtual machine, confirm the guest virtual machine's network functionality, and check that the guest virtual machine's vnetX is connected to the bridge you indicated.
    # brctl show
    bridge name     bridge id               STP enabled     interfaces
    virbr0          8000.5254007da9f2       yes                  virbr0-nic
    
    vnet0
    virbr1          8000.525400682996       yes                  virbr1-nic
    
  4. Update the guest virtual machine's network with the new interface parameters with the following command:
    # virsh update-device test1 br1.xml 
    
    Device updated successfully
    
    
  5. On the guest virtual machine, run service network restart. The guest virtual machine gets a new IP address for virbr1. Check the guest virtual machine's vnet0 is connected to the new bridge(virbr1)
    # brctl show
    bridge name     bridge id               STP enabled     interfaces
    virbr0          8000.5254007da9f2       yes             virbr0-nic
    virbr1          8000.525400682996       yes             virbr1-nic     vnet0
    

17.14. Applying Network Filtering

This section provides an introduction to libvirt's network filters, their goals, concepts and XML format.

17.14.1. Introduction

The goal of the network filtering, is to enable administrators of a virtualized system to configure and enforce network traffic filtering rules on virtual machines and manage the parameters of network traffic that virtual machines are allowed to send or receive. The network traffic filtering rules are applied on the host physical machine when a virtual machine is started. Since the filtering rules cannot be circumvented from within the virtual machine, it makes them mandatory from the point of view of a virtual machine user.
From the point of view of the guest virtual machine, the network filtering system allows each virtual machine's network traffic filtering rules to be configured individually on a per interface basis. These rules are applied on the host physical machine when the virtual machine is started and can be modified while the virtual machine is running. The latter can be achieved by modifying the XML description of a network filter.
Multiple virtual machines can make use of the same generic network filter. When such a filter is modified, the network traffic filtering rules of all running virtual machines that reference this filter are updated. The machines that are not running will update on start.
As previously mentioned, applying network traffic filtering rules can be done on individual network interfaces that are configured for certain types of network configurations. Supported network types include:
  • network
  • ethernet -- must be used in bridging mode
  • bridge

Example 17.1. An example of network filtering

The interface XML is used to reference a top-level filter. In the following example, the interface description references the filter clean-traffic.
   <devices>
    <interface type='bridge'>
      <mac address='00:16:3e:5d:c7:9e'/>
      <filterref filter='clean-traffic'/>
    </interface>
  </devices>
Network filters are written in XML and may either contain: references to other filters, rules for traffic filtering, or hold a combination of both. The above referenced filter clean-traffic is a filter that only contains references to other filters and no actual filtering rules. Since references to other filters can be used, a tree of filters can be built. The clean-traffic filter can be viewed using the command: # virsh nwfilter-dumpxml clean-traffic.
As previously mentioned, a single network filter can be referenced by multiple virtual machines. Since interfaces will typically have individual parameters associated with their respective traffic filtering rules, the rules described in a filter's XML can be generalized using variables. In this case, the variable name is used in the filter XML and the name and value are provided at the place where the filter is referenced.

Example 17.2. Description extended

In the following example, the interface description has been extended with the parameter IP and a dotted IP address as a value.
  <devices>
    <interface type='bridge'>
      <mac address='00:16:3e:5d:c7:9e'/>
      <filterref filter='clean-traffic'>
        <parameter name='IP' value='10.0.0.1'/>
      </filterref>
    </interface>
  </devices>
In this particular example, the clean-traffic network traffic filter will be represented with the IP address parameter 10.0.0.1 and as per the rule dictates that all traffic from this interface will always be using 10.0.0.1 as the source IP address, which is one of the purpose of this particular filter.

17.14.2. Filtering Chains

Filtering rules are organized in filter chains. These chains can be thought of as having a tree structure with packet filtering rules as entries in individual chains (branches).
Packets start their filter evaluation in the root chain and can then continue their evaluation in other chains, return from those chains back into the root chain or be dropped or accepted by a filtering rule in one of the traversed chains.
Libvirt's network filtering system automatically creates individual root chains for every virtual machine's network interface on which the user chooses to activate traffic filtering. The user may write filtering rules that are either directly instantiated in the root chain or may create protocol-specific filtering chains for efficient evaluation of protocol-specific rules.
The following chains exist:
  • root
  • mac
  • stp (spanning tree protocol)
  • vlan
  • arp and rarp
  • ipv4
  • ipv6
Multiple chains evaluating the mac, stp, vlan, arp, rarp, ipv4, or ipv6 protocol can be created using the protocol name only as a prefix in the chain's name.

Example 17.3. ARP traffic filtering

This example allows chains with names arp-xyz or arp-test to be specified and have their ARP protocol packets evaluated in those chains.
The following filter XML shows an example of filtering ARP traffic in the arp chain.
<filter name='no-arp-spoofing' chain='arp' priority='-500'>
  <uuid>f88f1932-debf-4aa1-9fbe-f10d3aa4bc95</uuid>
  <rule action='drop' direction='out' priority='300'>
    <mac match='no' srcmacaddr='$MAC'/>
  </rule>
  <rule action='drop' direction='out' priority='350'>
    <arp match='no' arpsrcmacaddr='$MAC'/>
  </rule>
  <rule action='drop' direction='out' priority='400'>
    <arp match='no' arpsrcipaddr='$IP'/>
  </rule>
  <rule action='drop' direction='in' priority='450'>
    <arp opcode='Reply'/>
    <arp match='no' arpdstmacaddr='$MAC'/>
  </rule>
  <rule action='drop' direction='in' priority='500'>
    <arp match='no' arpdstipaddr='$IP'/>
  </rule>
  <rule action='accept' direction='inout' priority='600'>
    <arp opcode='Request'/>
  </rule>
  <rule action='accept' direction='inout' priority='650'>
    <arp opcode='Reply'/>
  </rule>
  <rule action='drop' direction='inout' priority='1000'/>
</filter>
The consequence of putting ARP-specific rules in the arp chain, rather than for example in the root chain, is that packets protocols other than ARP do not need to be evaluated by ARP protocol-specific rules. This improves the efficiency of the traffic filtering. However, one must then pay attention to only putting filtering rules for the given protocol into the chain since other rules will not be evaluated. For example, an IPv4 rule will not be evaluated in the ARP chain since IPv4 protocol packets will not traverse the ARP chain.

17.14.3. Filtering Chain Priorities

As previously mentioned, when creating a filtering rule, all chains are connected to the root chain. The order in which those chains are accessed is influenced by the priority of the chain. The following table shows the chains that can be assigned a priority and their default priorities.

Table 17.1. Filtering chain default priorities values

Chain (prefix) Default priority
stp -810
mac -800
vlan -750
ipv4 -700
ipv6 -600
arp -500
rarp -400

Note

A chain with a lower priority value is accessed before one with a higher value.
The chains listed in Table 17.1, “Filtering chain default priorities values” can be also be assigned custom priorities by writing a value in the range [-1000 to 1000] into the priority (XML) attribute in the filter node. Section 17.14.2, “Filtering Chains”filter shows the default priority of -500 for arp chains, for example.

17.14.4. Usage of Variables in Filters

There are two variables that have been reserved for usage by the network traffic filtering subsystem: MAC and IP.
MAC is designated for the MAC address of the network interface. A filtering rule that references this variable will automatically be replaced with the MAC address of the interface. This works without the user having to explicitly provide the MAC parameter. Even though it is possible to specify the MAC parameter similar to the IP parameter above, it is discouraged since libvirt knows what MAC address an interface will be using.
The parameter IP represents the IP address that the operating system inside the virtual machine is expected to use on the given interface. The IP parameter is special in so far as the libvirt daemon will try to determine the IP address (and thus the IP parameter's value) that is being used on an interface if the parameter is not explicitly provided but referenced. For current limitations on IP address detection, consult the section on limitations Section 17.14.12, “Limitations” on how to use this feature and what to expect when using it. The XML file shown in Section 17.14.2, “Filtering Chains” contains the filter no-arp-spoofing, which is an example of using a network filter XML to reference the MAC and IP variables.
Note that referenced variables are always prefixed with the character $. The format of the value of a variable must be of the type expected by the filter attribute identified in the XML. In the above example, the IP parameter must hold a legal IP address in standard format. Failure to provide the correct structure will result in the filter variable not being replaced with a value and will prevent a virtual machine from starting or will prevent an interface from attaching when hot plugging is being used. Some of the types that are expected for each XML attribute are shown in the example Example 17.4, “Sample variable types”.

Example 17.4. Sample variable types

As variables can contain lists of elements, (the variable IP can contain multiple IP addresses that are valid on a particular interface, for example), the notation for providing multiple elements for the IP variable is:
  <devices>
    <interface type='bridge'>
      <mac address='00:16:3e:5d:c7:9e'/>
      <filterref filter='clean-traffic'>
        <parameter name='IP' value='10.0.0.1'/>
        <parameter name='IP' value='10.0.0.2'/>
        <parameter name='IP' value='10.0.0.3'/>
      </filterref>
    </interface>
  </devices>
This XML file creates filters to enable multiple IP addresses per interface. Each of the IP addresses will result in a separate filtering rule. Therefore, using the XML above and the following rule, three individual filtering rules (one for each IP address) will be created:
  <rule action='accept' direction='in' priority='500'>
    <tcp srpipaddr='$IP'/>
  </rule>
As it is possible to access individual elements of a variable holding a list of elements, a filtering rule like the following accesses the 2nd element of the variable DSTPORTS.
  <rule action='accept' direction='in' priority='500'>
    <udp dstportstart='$DSTPORTS[1]'/>
  </rule>

Example 17.5. Using a variety of variables

As it is possible to create filtering rules that represent all of the permissible rules from different lists using the notation $VARIABLE[@<iterator id="x">]. The following rule allows a virtual machine to receive traffic on a set of ports, which are specified in DSTPORTS, from the set of source IP address specified in SRCIPADDRESSES. The rule generates all combinations of elements of the variable DSTPORTS with those of SRCIPADDRESSES by using two independent iterators to access their elements.
  <rule action='accept' direction='in' priority='500'>
    <ip srcipaddr='$SRCIPADDRESSES[@1]' dstportstart='$DSTPORTS[@2]'/>
  </rule>
Assign concrete values to SRCIPADDRESSES and DSTPORTS as shown:
  SRCIPADDRESSES = [ 10.0.0.1, 11.1.2.3 ]
  DSTPORTS = [ 80, 8080 ]
Assigning values to the variables using $SRCIPADDRESSES[@1] and $DSTPORTS[@2] would then result in all variants of addresses and ports being created as shown:
  • 10.0.0.1, 80
  • 10.0.0.1, 8080
  • 11.1.2.3, 80
  • 11.1.2.3, 8080
Accessing the same variables using a single iterator, for example by using the notation $SRCIPADDRESSES[@1] and $DSTPORTS[@1], would result in parallel access to both lists and result in the following combination:
  • 10.0.0.1, 80
  • 11.1.2.3, 8080

Note

$VARIABLE is short-hand for $VARIABLE[@0]. The former notation always assumes the role of iterator with iterator id="0" added as shown in the opening paragraph at the top of this section.

17.14.5. Automatic IP Address Detection and DHCP Snooping

This section provides information about automatic IP address detection and DHCP snooping.

17.14.5.1. Introduction

The detection of IP addresses used on a virtual machine's interface is automatically activated if the variable IP is referenced but no value has been assigned to it. The variable CTRL_IP_LEARNING can be used to specify the IP address learning method to use. Valid values include: any, dhcp, or none.
The value any instructs libvirt to use any packet to determine the address in use by a virtual machine, which is the default setting if the variable CTRL_IP_LEARNING is not set. This method will only detect a single IP address per interface. Once a guest virtual machine's IP address has been detected, its IP network traffic will be locked to that address, if for example, IP address spoofing is prevented by one of its filters. In that case, the user of the VM will not be able to change the IP address on the interface inside the guest virtual machine, which would be considered IP address spoofing. When a guest virtual machine is migrated to another host physical machine or resumed after a suspend operation, the first packet sent by the guest virtual machine will again determine the IP address that the guest virtual machine can use on a particular interface.
The value of dhcp instructs libvirt to only honor DHCP server-assigned addresses with valid leases. This method supports the detection and usage of multiple IP address per interface. When a guest virtual machine resumes after a suspend operation, any valid IP address leases are applied to its filters. Otherwise the guest virtual machine is expected to use DHCP to obtain a new IP addresses. When a guest virtual machine migrates to another physical host physical machine, the guest virtual machine is required to re-run the DHCP protocol.
If CTRL_IP_LEARNING is set to none, libvirt does not do IP address learning and referencing IP without assigning it an explicit value is an error.

17.14.5.2. DHCP Snooping

CTRL_IP_LEARNING=dhcp (DHCP snooping) provides additional anti-spoofing security, especially when combined with a filter allowing only trusted DHCP servers to assign IP addresses. To enable this, set the variable DHCPSERVER to the IP address of a valid DHCP server and provide filters that use this variable to filter incoming DHCP responses.
When DHCP snooping is enabled and the DHCP lease expires, the guest virtual machine will no longer be able to use the IP address until it acquires a new, valid lease from a DHCP server. If the guest virtual machine is migrated, it must get a new valid DHCP lease to use an IP address (for example by bringing the VM interface down and up again).

Note

Automatic DHCP detection listens to the DHCP traffic the guest virtual machine exchanges with the DHCP server of the infrastructure. To avoid denial-of-service attacks on libvirt, the evaluation of those packets is rate-limited, meaning that a guest virtual machine sending an excessive number of DHCP packets per second on an interface will not have all of those packets evaluated and thus filters may not get adapted. Normal DHCP client behavior is assumed to send a low number of DHCP packets per second. Further, it is important to setup appropriate filters on all guest virtual machines in the infrastructure to avoid them being able to send DHCP packets. Therefore, guest virtual machines must either be prevented from sending UDP and TCP traffic from port 67 to port 68 or the DHCPSERVER variable should be used on all guest virtual machines to restrict DHCP server messages to only be allowed to originate from trusted DHCP servers. At the same time anti-spoofing prevention must be enabled on all guest virtual machines in the subnet.

Example 17.6. Activating IPs for DHCP snooping

The following XML provides an example for the activation of IP address learning using the DHCP snooping method:
    <interface type='bridge'>
      <source bridge='virbr0'/>
      <filterref filter='clean-traffic'>
        <parameter name='CTRL_IP_LEARNING' value='dhcp'/>
      </filterref>
    </interface>

17.14.6. Reserved Variables

Table 17.2, “Reserved variables” shows the variables that are considered reserved and are used by libvirt:

Table 17.2. Reserved variables

Variable Name Definition
MAC The MAC address of the interface
IP The list of IP addresses in use by an interface
IPV6 Not currently implemented: the list of IPV6 addresses in use by an interface
DHCPSERVER The list of IP addresses of trusted DHCP servers
DHCPSERVERV6 Not currently implemented: The list of IPv6 addresses of trusted DHCP servers
CTRL_IP_LEARNING The choice of the IP address detection mode

17.14.7. Element and Attribute Overview

The root element required for all network filters is named <filter> with two possible attributes. The name attribute provides a unique name of the given filter. The chain attribute is optional but allows certain filters to be better organized for more efficient processing by the firewall subsystem of the underlying host physical machine. Currently, the system only supports the following chains: root, ipv4, ipv6, arp and rarp.

17.14.8. References to Other Filters

Any filter may hold references to other filters. Individual filters may be referenced multiple times in a filter tree but references between filters must not introduce loops.

Example 17.7. An Example of a clean traffic filter

The following shows the XML of the clean-traffic network filter referencing several other filters.
<filter name='clean-traffic'>
  <uuid>6ef53069-ba34-94a0-d33d-17751b9b8cb1</uuid>
  <filterref filter='no-mac-spoofing'/>
  <filterref filter='no-ip-spoofing'/>
  <filterref filter='allow-incoming-ipv4'/>
  <filterref filter='no-arp-spoofing'/>
  <filterref filter='no-other-l2-traffic'/>
  <filterref filter='qemu-announce-self'/>
</filter>
To reference another filter, the XML node <filterref> needs to be provided inside a filter node. This node must have the attribute filter whose value contains the name of the filter to be referenced.
New network filters can be defined at any time and may contain references to network filters that are not known to libvirt, yet. However, once a virtual machine is started or a network interface referencing a filter is to be hot-plugged, all network filters in the filter tree must be available. Otherwise the virtual machine will not start or the network interface cannot be attached.

17.14.9. Filter Rules

The following XML shows a simple example of a network traffic filter implementing a rule to drop traffic if the IP address (provided through the value of the variable IP) in an outgoing IP packet is not the expected one, thus preventing IP address spoofing by the VM.

Example 17.8. Example of network traffic filtering

<filter name='no-ip-spoofing' chain='ipv4'>
  <uuid>fce8ae33-e69e-83bf-262e-30786c1f8072</uuid>
  <rule action='drop' direction='out' priority='500'>
    <ip match='no' srcipaddr='$IP'/>
  </rule>
</filter>
The traffic filtering rule starts with the rule node. This node may contain up to three of the following attributes:
  • action is mandatory can have the following values:
    • drop (matching the rule silently discards the packet with no further analysis)
    • reject (matching the rule generates an ICMP reject message with no further analysis)
    • accept (matching the rule accepts the packet with no further analysis)
    • return (matching the rule passes this filter, but returns control to the calling filter for further analysis)
    • continue (matching the rule goes on to the next rule for further analysis)
  • direction is mandatory can have the following values:
    • in for incoming traffic
    • out for outgoing traffic
    • inout for incoming and outgoing traffic
  • priority is optional. The priority of the rule controls the order in which the rule will be instantiated relative to other rules. Rules with lower values will be instantiated before rules with higher values. Valid values are in the range of -1000 to 1000. If this attribute is not provided, priority 500 will be assigned by default. Note that filtering rules in the root chain are sorted with filters connected to the root chain following their priorities. This allows to interleave filtering rules with access to filter chains. See Section 17.14.3, “Filtering Chain Priorities” for more information.
  • statematch is optional. Possible values are '0' or 'false' to turn the underlying connection state matching off. The default setting is 'true' or 1
The above example Example 17.7, “An Example of a clean traffic filter” indicates that the traffic of type ip will be associated with the chain ipv4 and the rule will have priority=500. If for example another filter is referenced whose traffic of type ip is also associated with the chain ipv4 then that filter's rules will be ordered relative to the priority=500 of the shown rule.
A rule may contain a single rule for filtering of traffic. The above example shows that traffic of type ip is to be filtered.

17.14.10. Supported Protocols

The following sections list and give some details about the protocols that are supported by the network filtering subsystem. This type of traffic rule is provided in the rule node as a nested node. Depending on the traffic type a rule is filtering, the attributes are different. The above example showed the single attribute srcipaddr that is valid inside the ip traffic filtering node. The following sections show what attributes are valid and what type of data they are expecting. The following datatypes are available:
  • UINT8 : 8 bit integer; range 0-255
  • UINT16: 16 bit integer; range 0-65535
  • MAC_ADDR: MAC address in dotted decimal format, for example 00:11:22:33:44:55
  • MAC_MASK: MAC address mask in MAC address format, for instance, FF:FF:FF:FC:00:00
  • IP_ADDR: IP address in dotted decimal format, for example 10.1.2.3
  • IP_MASK: IP address mask in either dotted decimal format (255.255.248.0) or CIDR mask (0-32)
  • IPV6_ADDR: IPv6 address in numbers format, for example FFFF::1
  • IPV6_MASK: IPv6 mask in numbers format (FFFF:FFFF:FC00::) or CIDR mask (0-128)
  • STRING: A string
  • BOOLEAN: 'true', 'yes', '1' or 'false', 'no', '0'
  • IPSETFLAGS: The source and destination flags of the ipset described by up to 6 'src' or 'dst' elements selecting features from either the source or destination part of the packet header; example: src,src,dst. The number of 'selectors' to provide here depends on the type of ipset that is referenced
Every attribute except for those of type IP_MASK or IPV6_MASK can be negated using the match attribute with value no. Multiple negated attributes may be grouped together. The following XML fragment shows such an example using abstract attributes.
[...]
  <rule action='drop' direction='in'>
    <protocol match='no' attribute1='value1' attribute2='value2'/>
    <protocol attribute3='value3'/>
  </rule>
[...]
Rules behave evaluate the rule as well as look at it logically within the boundaries of the given protocol attributes. Thus, if a single attribute's value does not match the one given in the rule, the whole rule will be skipped during the evaluation process. Therefore, in the above example incoming traffic will only be dropped if: the protocol property attribute1 does not match both value1 and the protocol property attribute2 does not match value2 and the protocol property attribute3 matches value3.

17.14.10.1. MAC (Ethernet)

Protocol ID: mac
Rules of this type should go into the root chain.

Table 17.3. MAC protocol types

Attribute Name Datatype Definition
srcmacaddr MAC_ADDR MAC address of sender
srcmacmask MAC_MASK Mask applied to MAC address of sender
dstmacaddr MAC_ADDR MAC address of destination
dstmacmask MAC_MASK Mask applied to MAC address of destination
protocolid UINT16 (0x600-0xffff), STRING Layer 3 protocol ID. Valid strings include [arp, rarp, ipv4, ipv6]
comment STRING text string up to 256 characters
The filter can be written as such:
[...]
<mac match='no' srcmacaddr='$MAC'/>
[...]

17.14.10.2. VLAN (802.1Q)

Protocol ID: vlan
Rules of this type should go either into the root or vlan chain.

Table 17.4. VLAN protocol types

Attribute Name Datatype Definition
srcmacaddr MAC_ADDR MAC address of sender
srcmacmask MAC_MASK Mask applied to MAC address of sender
dstmacaddr MAC_ADDR MAC address of destination
dstmacmask MAC_MASK Mask applied to MAC address of destination
vlan-id UINT16 (0x0-0xfff, 0 - 4095) VLAN ID
encap-protocol UINT16 (0x03c-0xfff), String Encapsulated layer 3 protocol ID, valid strings are arp, ipv4, ipv6
com