How to configure CPU pinning without use of `isolcpus` kernel cmdline parameter in Red Hat OpenStack Platform

Solution In Progress - Updated -

Environment

Red Hat Enterprise Linux OpenStack Platform 7
Red Hat OpenStack Platform 8
Red Hat OpenStack Platform 9
Red Hat OpenStack Platform 10

Issue

For virtualization hosts it is common to want to confine host operating system processes to a subset of available CPU/RAM resources, leaving the rest available for exclusive use by virtual guests. Historically the most common way to do this has been using the isolcpus kernel argument. However, the semantics of this argument recently changed. Now, it is primarily intended for use with real-time guests. A side effect of this on recent kernels is that any CPUs listed in isolcpus are also excluded from load balancing by the kernel scheduler, including any QEMU threads associated with virtual guests that are running on them. For non-real-time guests it is typically preferable that the QEMU threads are in fact load balanced across CPUs which is at odds with this approach to CPU isolation.

Resolution

Note: Red Hat's objective is to have proper isolation with systemd so that isolcpus will not be needed in the future. Tuned parameter no_balance_cores replaces part of the functionality of isolcpus, although this has not yet been confirmed as an optimum solution. Hence when using NFV technologies such as PCI passthrough, SR-IOV or DPDK, the use of isolcpus is still mandatory even with RHEL 7.5 GA . Currently, systemd CPUAffinity does not provide sufficient isolation. For more details, see https://bugzilla.redhat.com/show_bug.cgi?id=1497182
This also raises a concern that emulatorpin thread started by qemu-kvm proccess is pinned with affinity to all isolated physical cpu cores that the emulated vcpu threads are running on. To delegate this concern, pin the emulator threads to the range of physical CPUs which are not isolated using isolcpus or being used for instances via vcpu_pin_set.
Hence until a preferred & tested solution is confirmed, cpu isolation is required to be configured via tuned's cpu-partitioning profile & kernel's isolcpus parameter along with emulator pin threads being reassigned.

Kernel's isolcpus parameter & systemd CPUAfinity can be configured via first-boot configuration via deployment using HostIsolatedCoreList parameter.

These configuration can also be configured using the following manual steps.

  • The CPUAffinity option in /etc/systemd/system.conf sets affinity for systemd itself, as well as everything it launches, unless their .service file overrides the CPUAffinity setting with its own value.
    Configure the CPUAffinity option in /etc/systemd/system.conf.
-------------------
#CPUAffinity=1 2
-------------------

Note: CPUAffinity option takes a list of CPU indices or ranges separated by white space.

For example, in an environment with CPU pinning, on a hypervisor with 4 CPUs, and nova confined to CPUs 2 and 3, configure the following:

root@overcloud-compute-0 ~]# grep CPUAffinity /etc/systemd/system.conf
CPUAffinity=0 1
[root@overcloud-compute-0 ~]# grep vcpu_pin_set /etc/nova/nova.conf 
vcpu_pin_set=2,3
  • isolcpus can be added in default grub configuration.
vi /etc/default/grub
GRUB_CMDLINE_LINUX="console=tty0 console=ttyS0,115200n8 crashkernel=auto rhgb quiet default_hugepagesz=1GB hugepagesz=1G hugepages=28 iommu=pt intel_iommu=on isolcpus=2,3,4,5,6,7,8,9,10,11"
grub2-mkconfig -o /etc/grub2.cfg

Diagnostic Steps

Refer to https://access.redhat.com/solutions/2759041 for further details.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

1 Comments

The KCS solution seems ambiguous and countering its own suggestions with information that's not really important to the customers.

If there are 2 possibilities (isolcpus, CPUAffinity) discussed here for isolating the cores from the compute-node and dedicating them to the VMs, either there should not be an introduction of another method like tuned-cpu-partitioning profile discussion here, because CPUAffinity has some BZ#1497182 related to it not providing expected functionality OR CPUAffinity should not be discussed at all using manual config option in /etc/systemd/system.conf - This makes the reader who's been redirected from an excellent read like stackblog thinks why didn't the author of the blog refine this KCS sol before mentioning it in his blog.

Can someone please refine this KCS to :
  1. Use tuned-cpu-partitioning profile only as the ultimate recommendation
  2. Re-visit whether its worth mentioning no_balance_cores as it adds to confusion/unrequired_information.
  3. Divide it into 2 parts for Director (needing template modifications) vs non_Director based deployments (needing manual configurations in respective files)