Chapter 11. Enabling RT-KVM for NFV Workloads

To facilitate installing and configuring Red Hat Enterprise Linux Real Time KVM (RT-KVM), Red Hat OpenStack Platform provides the following features:

  • A real-time Compute node role that provisions Red Hat Enterprise Linux for real-time.
  • The additional RT-KVM kernel module.
  • Automatic configuration of the Compute node.

11.1. Planning for your RT-KVM Compute nodes

When planning for RT-KVM Compute nodes, ensure that the following tasks are completed:

  • You must use Red Hat certified servers for your RT-KVM Compute nodes.

    For more information, see Red Hat Enterprise Linux for Real Time certified servers.

  • Register your undercloud and attach a valid Red Hat OpenStack Platform subscription.

    For more information, see: Registering the undercloud and attaching subscriptions in Installing and managing Red Hat OpenStack Platform with director.

  • Enable the repositories that are required for the undercloud, such as the rhel-9-server-nfv-rpms repository for RT-KVM, and update the system packages to the latest versions.

    Note

    You need a separate subscription to a Red Hat OpenStack Platform for Real Time SKU before you can access this repository.

    For more information, see Enabling repositories for the undercloud in Installing and managing Red Hat OpenStack Platform with director.

Building the real-time image

  1. Install the libguestfs-tools package on the undercloud to get the virt-customize tool:

    (undercloud) [stack@undercloud-0 ~]$ sudo dnf install libguestfs-tools
    Important

    If you install the libguestfs-tools package on the undercloud, disable iscsid.socket to avoid port conflicts with the tripleo_iscsid service on the undercloud:

    $ sudo systemctl disable --now iscsid.socket
  2. Extract the images:

    (undercloud) [stack@undercloud-0 ~]$ tar -xf /usr/share/rhosp-director-images/overcloud-hardened-uefi-full-17.1.x86_64.tar
    (undercloud) [stack@undercloud-0 ~]$ tar -xf /usr/share/rhosp-director-images/ironic-python-agent-17.1.x86_64.tar
  3. Copy the default image:

    (undercloud) [stack@undercloud-0 ~]$ cp overcloud-hardened-uefi-full.qcow2 overcloud-realtime-compute.qcow2
  4. Register your image to enable Red Hat repositories relevant to your customizations. Replace [username] and [password] with valid credentials in the following example.

    virt-customize -a overcloud-realtime-compute.qcow2 --run-command \
    'subscription-manager register --username=[username] --password=[password]' \
    subscription-manager release --set 9.0
    Note

    For security, you can remove credentials from the history file if they are used on the command prompt. You can delete individual lines in history using the history -d command followed by the line number.

  5. Find a list of pool IDs from your account’s subscriptions, and attach the appropriate pool ID to your image.

    sudo subscription-manager list --all --available | less
    ...
    virt-customize -a overcloud-realtime-compute.qcow2 --run-command \
    'subscription-manager attach --pool [pool-ID]'
  6. Add the repositories necessary for Red Hat OpenStack Platform with NFV.

    virt-customize -a overcloud-realtime-compute.qcow2 --run-command \
    'sudo subscription-manager repos --enable=rhel-9-for-x86_64-baseos-eus-rpms \
    --enable=rhel-9-for-x86_64-appstream-eus-rpms \
    --enable=rhel-9-for-x86_64-highavailability-eus-rpms \
    --enable=ansible-2.9-for-rhel-9-x86_64-rpms \
    --enable=rhel-9-for-x86_64-nfv-rpms
    --enable=fast-datapath-for-rhel-9-x86_64-rpms'
  7. Create a script to configure real-time capabilities on the image.

    (undercloud) [stack@undercloud-0 ~]$ cat <<'EOF' > rt.sh
      #!/bin/bash
    
      set -eux
    
      dnf -v -y --setopt=protected_packages= erase kernel.$(uname -m)
      dnf -v -y install kernel-rt kernel-rt-kvm tuned-profiles-nfv-host
      grubby --set-default /boot/vmlinuz*rt*
      EOF
  8. Run the script to configure the real-time image:

    (undercloud) [stack@undercloud-0 ~]$ virt-customize -a overcloud-realtime-compute.qcow2 -v --run rt.sh 2>&1 | tee virt-customize.log
    Note

    If you see the following line in the rt.sh script output, "grubby fatal error: unable to find a suitable template", you can ignore this error.

  9. Examine the virt-customize.log file that resulted from the previous command, to check that the packages installed correctly using the rt.sh script .

    (undercloud) [stack@undercloud-0 ~]$ cat virt-customize.log | grep Verifying
    
      Verifying  : kernel-3.10.0-957.el7.x86_64                                 1/1
      Verifying  : 10:qemu-kvm-tools-rhev-2.12.0-18.el7_6.1.x86_64              1/8
      Verifying  : tuned-profiles-realtime-2.10.0-6.el7_6.3.noarch              2/8
      Verifying  : linux-firmware-20180911-69.git85c5d90.el7.noarch             3/8
      Verifying  : tuned-profiles-nfv-host-2.10.0-6.el7_6.3.noarch              4/8
      Verifying  : kernel-rt-kvm-3.10.0-957.10.1.rt56.921.el7.x86_64            5/8
      Verifying  : tuna-0.13-6.el7.noarch                                       6/8
      Verifying  : kernel-rt-3.10.0-957.10.1.rt56.921.el7.x86_64                7/8
      Verifying  : rt-setup-2.0-6.el7.x86_64                                    8/8
  10. Relabel SELinux:

    (undercloud) [stack@undercloud-0 ~]$ virt-customize -a overcloud-realtime-compute.qcow2 --selinux-relabel
  11. Extract vmlinuz and initrd:

    (undercloud) [stack@undercloud-0 ~]$ mkdir image
    (undercloud) [stack@undercloud-0 ~]$ guestmount -a overcloud-realtime-compute.qcow2 -i --ro image
    (undercloud) [stack@undercloud-0 ~]$ cp image/boot/vmlinuz-3.10.0-862.rt56.804.el7.x86_64 ./overcloud-realtime-compute.vmlinuz
    (undercloud) [stack@undercloud-0 ~]$ cp image/boot/initramfs-3.10.0-862.rt56.804.el7.x86_64.img ./overcloud-realtime-compute.initrd
    (undercloud) [stack@undercloud-0 ~]$ guestunmount image
    Note

    The software version in the vmlinuz and initramfs filenames vary with the kernel version.

  12. Upload the image:

    (undercloud) [stack@undercloud-0 ~]$ openstack overcloud image upload --update-existing --os-image-name overcloud-realtime-compute.qcow2

You now have a real-time image you can use with the ComputeOvsDpdkRT composable role on your selected Compute nodes.

Modifying BIOS settings on RT-KVM Compute nodes

To reduce latency on your RT-KVM Compute nodes, disable all options for the following parameters in your Compute node BIOS settings:

  • Power Management
  • Hyper-Threading
  • CPU sleep states
  • Logical processors

11.2. Configuring OVS-DPDK with RT-KVM

11.2.1. Designating nodes for Real-time Compute

To designate nodes for Real-time Compute, create a new role file to configure the Real-time Compute role, and configure the bare-metal nodes with a Real-time Compute resource class to tag the Compute nodes for real-time.

Note

The following procedure applies to new overcloud nodes that you have not yet provisioned. To assign a resource class to an existing overcloud node that has already been provisioned, scale down the overcloud to unprovision the node, then scale up the overcloud to reprovision the node with the new resource class assignment. For more information, see Scaling overcloud nodes in Installing and managing Red Hat OpenStack Platform with director.

Procedure

  1. Log in to the undercloud host as the stack user.
  2. Source the stackrc undercloud credentials file:

    [stack@director ~]$ source ~/stackrc
  3. Based on the /usr/share/openstack-tripleo-heat-templates/environments/compute-real-time-example.yaml file, create a compute-real-time.yaml environment file that sets the parameters for the ComputeRealTime role.
  4. Generate a new roles data file named roles_data_rt.yaml that includes the ComputeRealTime role, along with any other roles that you need for the overcloud. The following example generates the roles data file roles_data_rt.yaml, which includes the roles Controller, Compute, and ComputeRealTime:

    (undercloud)$ openstack overcloud roles generate \
    -o /home/stack/templates/roles_data_rt.yaml \
    ComputeRealTime Compute Controller
  5. Update the roles_data_rt.yaml file for the ComputeRealTime role:

    ###################################################
    # Role: ComputeRealTime                                                         #
    ###################################################
    - name: ComputeRealTime
      description: |
        Real Time Compute Node role
      CountDefault: 1
      # Create external Neutron bridge
      tags:
        - compute
        - external_bridge
      networks:
        InternalApi:
          subnet: internal_api_subnet
        Tenant:
          subnet: tenant_subnet
        Storage:
          subnet: storage_subnet
      HostnameFormatDefault: '%stackname%-computert-%index%'
      deprecated_nic_config_name: compute-rt.yaml
  6. Register the ComputeRealTime nodes for the overcloud by adding them to your node definition template: node.json or node.yaml.

    For more information, see Registering nodes for the overcloud in Installing and managing Red Hat OpenStack Platform with director.

  7. Inspect the node hardware:

    (undercloud)$ openstack overcloud node introspect --all-manageable --provide

    For more information, see Creating an inventory of the bare-metal node hardware in Installing and managing Red Hat OpenStack Platform with director.

  8. Tag each bare-metal node that you want to designate for ComputeRealTime with a custom ComputeRealTime resource class:

    (undercloud)$ openstack baremetal node set \
     --resource-class baremetal.RTCOMPUTE <node>

    Replace <node> with the name or UUID of the bare-metal node.

  9. Add the ComputeRealTime role to your node definition file, overcloud-baremetal-deploy.yaml, and define any predictive node placements, resource classes, network topologies, or other attributes that you want to assign to your nodes:

    - name: Controller
      count: 3
      ...
    - name: Compute
      count: 3
      ...
    - name: ComputeRealTime
      count: 1
      defaults:
        resource_class: baremetal.RTCOMPUTE
        network_config:
          template: /home/stack/templates/nic-config/<role_topology_file>
    • Replace <role_topology_file> with the name of the topology file to use for the ComputeRealTime role, for example, myRoleTopology.j2. You can reuse an existing network topology or create a new custom network interface template for the role.

      For more information, see Custom network interface templates in Installing and managing Red Hat OpenStack Platform with director. To use the default network definition settings, do not include network_config in the role definition.

      For more information about the properties you can use to configure node attributes in your node definition file, see Bare-metal node provisioning attributes in Installing and managing Red Hat OpenStack Platform with director.

      For an example node definition file, see Example node definition file in Installing and managing Red Hat OpenStack Platform with director.

  10. Create the following Ansible playbook to configure the kernel during the node provisioning, and save the playbook as /home/stack/templates/fix_rt_kernel.yaml:

    # RealTime KVM fix until BZ #2122949 is closed-
    - name: Fix RT Kernel
      hosts: allovercloud
      any_errors_fatal: true
      gather_facts: false
      vars:
        reboot_wait_timeout: 900
      pre_tasks:
        - name: Wait for provisioned nodes to boot
          wait_for_connection:
            timeout: 600
            delay: 10
      tasks:
        - name: Fix bootloader entry
          become: true
          shell: |-
            set -eux
            new_entry=$(grep saved_entry= /boot/grub2/grubenv | sed -e s/saved_entry=//)
            source /etc/default/grub
            sed -i "s/options.*/options root=$GRUB_DEVICE ro $GRUB_CMDLINE_LINUX $GRUB_CMDLINE_LINUX_DEFAULT/" /boot/loader/entries/$(</etc/machine-id)$new_entry.conf
            cp -f /boot/grub2/grubenv /boot/efi/EFI/redhat/grubenv
      post_tasks:
        - name: Configure reboot after new kernel
          become: true
          reboot:
            reboot_timeout: "{{ reboot_wait_timeout }}"
          when: reboot_wait_timeout is defined
  11. Include /home/stack/templates/fix_rt_kernel.yaml as a playbook in the ComputeOvsDpdkSriovRT role definition in your node provisioning file:

    - name: ComputeOvsDpdkSriovRT
      ...
      ansible_playbooks:
        - playbook: /usr/share/ansible/tripleo-playbooks/cli-overcloud-node-kernelargs.yaml
          extra_vars:
            kernel_args: "default_hugepagesz=1GB hugepagesz=1G hugepages=64 iommu=pt intel_iommu=on tsx=off isolcpus=2-19,22-39"
            reboot_wait_timeout: 900
            tuned_profile: "cpu-partitioning"
            tuned_isolated_cores: "2-19,22-39"
            defer_reboot: true
        - playbook: /home/stack/templates/fix_rt_kernel.yaml
          extra_vars:
            reboot_wait_timeout: 1800

    For more information about the properties you can use to configure node attributes in your node definition file, see Bare-metal node provisioning attributes in Installing and managing Red Hat OpenStack Platform with director.

    For an example node definition file, see Example node definition file in Installing and managing Red Hat OpenStack Platform with director.

  12. Provision the new nodes for your role:

    (undercloud)$ openstack overcloud node provision \
    [--stack <stack> \ ]
    [--network-config \]
    --output <deployment_file> \
    /home/stack/templates/overcloud-baremetal-deploy.yaml
    • Optional: Replace <stack> with the name of the stack for which the bare-metal nodes are provisioned. The default is overcloud.
    • Optional: Include the --network-config optional argument to provide the network definitions to the cli-overcloud-node-network-config.yaml Ansible playbook. If you do not define the network definitions by using the network_config property, then the default network definitions are used.
    • Replace <deployment_file> with the name of the heat environment file to generate for inclusion in the deployment command, for example /home/stack/templates/overcloud-baremetal-deployed.yaml.
  13. Monitor the provisioning progress in a separate terminal. When provisioning is successful, the node state changes from available to active:

    (undercloud)$ watch openstack baremetal node list
  14. If you ran the provisioning command without the --network-config option, then configure the <Role>NetworkConfigTemplate parameters in your network-environment.yaml file to point to your NIC template files:

    parameter_defaults:
      ComputeNetworkConfigTemplate: /home/stack/templates/nic-configs/compute.j2
      ComputeAMDSEVNetworkConfigTemplate: /home/stack/templates/nic-configs/<rt_compute>.j2
      ControllerNetworkConfigTemplate: /home/stack/templates/nic-configs/controller.j2

    Replace <rt_compute> with the name of the file that contains the network topology of the ComputeRealTime role, for example, computert.yaml to use the default network topology.

  15. Add your environment file to the stack with your other environment files and deploy the overcloud:

    (undercloud)$ openstack overcloud deploy --templates \
     -r /home/stack/templates/roles_data_rt.yaml \
     -e /home/stack/templates/overcloud-baremetal-deployed.yaml
     -e /home/stack/templates/node-info.yaml \
     -e [your environment files] \
     -e /home/stack/templates/compute-real-time.yaml

11.2.2. Configuring OVS-DPDK parameters

  1. Under parameter_defaults, set the tunnel type to vxlan, and the network type to vxlan,vlan:

    NeutronTunnelTypes: 'vxlan'
    NeutronNetworkType: 'vxlan,vlan'
  2. Under parameters_defaults, set the bridge mapping:

    # The OVS logical->physical bridge mappings to use.
    NeutronBridgeMappings:
      - dpdk-mgmt:br-link0
  3. Under parameter_defaults, set the role-specific parameters for the ComputeOvsDpdkSriov role:

      ##########################
      # OVS DPDK configuration #
      ##########################
      ComputeOvsDpdkSriovParameters:
        KernelArgs: "default_hugepagesz=1GB hugepagesz=1G hugepages=32 iommu=pt intel_iommu=on isolcpus=2-19,22-39"
        TunedProfileName: "cpu-partitioning"
        IsolCpusList: "2-19,22-39"
        NovaComputeCpuDedicatedSet: ['4-19,24-39']
        NovaReservedHostMemory: 4096
        OvsDpdkSocketMemory: "3072,1024"
        OvsDpdkMemoryChannels: "4"
        OvsPmdCoreList: "2,22,3,23"
        NovaComputeCpuSharedSet: [0,20,1,21]
        NovaLibvirtRxQueueSize: 1024
        NovaLibvirtTxQueueSize: 1024
    Note

    To prevent failures during guest creation, assign at least one CPU with sibling thread on each NUMA node. In the example, the values for the OvsPmdCoreList parameter denote cores 2 and 22 from NUMA 0, and cores 3 and 23 from NUMA 1.

    Note

    These huge pages are consumed by the virtual machines, and also by OVS-DPDK using the OvsDpdkSocketMemory parameter as shown in this procedure. The number of huge pages available for the virtual machines is the boot parameter minus the OvsDpdkSocketMemory.

    You must also add hw:mem_page_size=1GB to the flavor you associate with the DPDK instance.

    Note

    OvsDpdkMemoryChannels is a required setting for this procedure. For optimum operation, ensure you deploy DPDK with appropriate parameters and values.

  4. Configure the role-specific parameters for SR-IOV:

      NovaPCIPassthrough:
        - vendor_id: "8086"
          product_id: "1528"
          address: "0000:06:00.0"
          trusted: "true"
          physical_network: "sriov-1"
        - vendor_id: "8086"
          product_id: "1528"
          address: "0000:06:00.1"
          trusted: "true"
          physical_network: "sriov-2"

11.3. Launching an RT-KVM instance

Perform the following steps to launch an RT-KVM instance on a real-time enabled Compute node:

  1. Create an RT-KVM flavor on the overcloud:

    $ openstack flavor create r1.small --id 99 --ram 4096 --disk 20 --vcpus 4
    $ openstack flavor set --property hw:cpu_policy=dedicated 99
    $ openstack flavor set --property hw:cpu_realtime=yes 99
    $ openstack flavor set --property hw:mem_page_size=1GB 99
    $ openstack flavor set --property hw:cpu_realtime_mask="^0-1" 99
    $ openstack flavor set --property hw:cpu_emulator_threads=isolate 99
  2. Launch an RT-KVM instance:

    $ openstack server create  --image <rhel> --flavor r1.small --nic net-id=<dpdk-net> test-rt
  3. To verify that the instance uses the assigned emulator threads, run the following command:

    $ virsh dumpxml <instance-id> | grep vcpu -A1
    <vcpu placement='static'>4</vcpu>
    <cputune>
      <vcpupin vcpu='0' cpuset='1'/>
      <vcpupin vcpu='1' cpuset='3'/>
      <vcpupin vcpu='2' cpuset='5'/>
      <vcpupin vcpu='3' cpuset='7'/>
      <emulatorpin cpuset='0-1'/>
      <vcpusched vcpus='2-3' scheduler='fifo'
      priority='1'/>
    </cputune>