The degree of risk to OpenStack deployments from L1TF is contingent on the way Hyper-Threading is used by guests. Nova has a number of CPU placement policies which will affect this:
Overcommit CPUs (“hw:cpu_policy=shared”, the default out of the box)
Guest CPUs are allowed to float freely across host CPUs. CPUs from distinct guests may be scheduled on SMT siblings at any time and are thus at risk from L1TF.
Dedicated CPUs (“hw:cpu_policy=dedicated”, opt-in via flavor or image props)
Guest vCPUs are pinned 1:1 with host physical CPUs. The “hw:cpu_thread_policy” flavor / image property will influence placement policy for hyper-threads:
- “prefer” - guest can be placed on hosts with or without SMT enabled. If the guest is on a host with SMT, host CPUs will be assigned linearly to guest CPUs. This usually results in HT siblings being assigned to the same guest. This is the default if - “hw:cpu_thread_policy” is not set.
- “require” - guest will only be placed on hosts with SMT enabled. Guest CPUs will be assigned to SMT siblings if possible.
- “isolate” - if guest is placed on a host with SMT, each SMT sibling will be reserved to prevent its use by other guests. This is the safest configuration with respect to L1TF
With both “prefer” and “require” policies in place, there is still a possibility that SMT siblings will run CPUs from distinct guests. This risk is elevated if flavors have odd CPU counts, or emulator threads are assigned dedicated host CPUs.
In most cases the supplementary (non-CPU related) threads in QEMU will run on the same host CPUs that are used by guest CPUs. There is thus the possibility that an emulator thread and CPU from the same guest will be scheduled on SMT siblings. This may be a concern if supplementary threads are handling data (such as encryption keys) that should not be exposed to the guest. This can apply to TLS host certificate keys during initial TLS session handshake. OpenStack uses a unique LUKS volume encryption key per disk so is minimally impacted.
There is a possibility that kernel threads might be running on the same host CPUs as guest CPUs, which may place the host kernel at risk from the guest CPUs. On KVM realtime hosts this can be avoided with the “isolcpus” kernel parameter, but this is not recommended for non-realtime hosts as it disables scheduler workload re-balancing.
If the nova.conf “vcpu_pin_set” config parameter is used, it may be necessary to update its CPU mask after disabling SMT. This parameter controls which host CPUs Nova will consider available for guest CPU placement.
Methods for disabling Hyper-Threading
If you decide to disable Hyper-Threading (SMT), two methods are available:
- BIOS setting. This usually requires interactive administrator changes, however, some hardware vendors may expose this setting through onboard management cards allowing for automation. Please check with your hardware vendor to determine if this is an option.
- Kernel boot option. The updated kernels add boot time parameters to allow the administrator to prevent the kernel using HT siblings. This can be fully automated in OpenStack deployments, as illustrated later in this document. Additional information can be found in our Disabling Hyper-Threading article.
Hot-unplug of HT siblings on running hosts via sysfs is strongly discouraged for OpenStack environments. Disabling CPUs will adversely impact running guests pinned to those CPUs. OpenStack management services have not yet been validated to determine that they behave correctly in response to CPUs being disabled. This may affect success of future guest launch operations.
How should a customer disable Hyper-Threading
Procedure on a fresh deployment (OSP12, OSP13)
- Deploy with -e $THT/environments/host-config-and-reboot.yaml where $THT is the directory used for TripleO Heat Templates.
- The parameter KernelArgs should be provided in the deployment environment file, with the set of kernel boot parameters to be applied on the compute roles. Example with the “Compute” role:
parameter_defaults: ComputeParameters: KernelArgs: "nosmt"
- In the example, KernelArgs doesn’t include other arguments needed for specific configurations (DPDK, SR-IOV, realtime, etc). Therefore, these also need to be included.
- This parameter is needed for all the roles used by all nodes (e.g. Controller, ComputeSriov, etc).
During the deployment, the overcloud nodes will be configured to have the right Kernel options and then rebooted before deploying all other services.
Procedure on a running deployment (OSP 10, 11, 12 and 13)
SSH to the Director node, download the Ansible playbooks provided by Red Hat and load overcloud stack credentials:
$ source ~/stackrc
To use the CVE-2018-3620-fix_disable_ht.yml playbook, simply call ansible-playbook with the hosts to be changed specified in the HOSTS extra var:
ansible-playbook -i /bin/tripleo-ansible-inventory -e "HOSTS=overcloud" CVE-2018-3620-fix_disable_ht.yml
To use CVE-2018-3620-apply_settings.yml, specify the desired features in extra vars, as well as the systems to be targetted in the HOSTS variable. For example:
- To turn off SMT, but allow runtime changes, and to leave the default behavior of conditional L1 data cache flushes:
$ ansible-playbook -i /bin/tripleo-ansible-inventory -e "ansible_ssh_common_args='-o StrictHostKeyChecking=no' HOSTS=overcloud SMT=1" CVE-2018-3620-apply_settings.yml
- To apply all mitigations and prevent runtime changes:
$ ansible-playbook -i /bin/tripleo-ansible-inventory -e "ansible_ssh_common_args='-o StrictHostKeyChecking=no' HOSTS=overcloud FLUSH=1 SMT=1 FORCE=1" CVE-2018-3620-apply_settings.yml
- To reset the L1TF settings to their default state:
$ ansible-playbook -i /bin/tripleo-ansible-inventory -e "ansible_ssh_common_args='-o StrictHostKeyChecking=no' HOSTS=overcloud RESET=1" CVE-2018-3620-apply_settings.yml
There is a related issue with the tuned cpu-partitioning profile related to these mitigations. This is configured in OSP by setting the TunedProfileName: "cpu-partitioning" in the tripleo-heat-templates. It is primarily used in Real-Time or DPDK environments. If an environment is using this profile, then they should wait until the new tuned package is available before applying the other mitigations in this article.