Can't live migrate some instances between hosts with different core counts

Solution In Progress - Updated -

Environment

  • Red Hat OpenStack 6.0 (RHOS)

Issue

  • Instance vCPUs are statically pinned to pCPUs when Numa is available and this prevents live migration from happening when an instance is pinned to cores that have higher position than the destination compute .

  • Sample hardware

compute01 has 2 socket * 18 core * 2 for hyperthreading = 72 (cores 0-71)
compute02 has 2 socket * 16 core * 2 for hyperthreading = 64 (cores 0-63)
  • vCPU affinity of instances on compute01
VCPU: CPU Affinity
----------------------------------
0: 18-35,54-71
1: 18-35,54-71
  • vCPU affinity of instances on compute02
VCPU: CPU Affinity
----------------------------------
0: 0-17,36-53
1: 0-17,36-53
  • XML dump of the instance shows the statically pinned vCPUs:
<vcpu placement='static' cpuset='18-35,54-71'>2</vcpu>
<vcpu placement='static' cpuset='0-17,36-53'>2</vcpu>
  • libvirt will display the following errors in /var/log/nova/nova-compute.log when a live migration is attempted:
Aug 27 08:19:18 compute01  nova-compute: libvirtError: Unable to write to '/sys/fs/cgroup/cpuset/machine.slice/machine-qemu\x2dinstance\x2d0002a870.scope/cpuset.cpus': Numerical result out of range
Aug 27 08:19:18 compute02 journal: Unable to write to '/sys/fs/cgroup/cpuset/machine.slice/machine-qemu\x2dinstance\x2d0002a870.scope/cpuset.cpus': Numerical result out of range

Resolution

  • This issues is solved by this errata and can be fixed by updating to python-nova-2014.2.3-29.el7ost or later on all the compute hosts.

Root Cause

  • When nova detects NUMA is supported by the system, it will schedule vCPUs to NUMA cells and automatically pin the vCPUs to pCPUs in the same cells in order to optimize pCPUs memory bandwidth

Diagnostic Steps

  • Live migration will fail and the following error logs will be displayed in /var/log/nova/nova-compute.log:
Aug 27 08:19:18 compute01  nova-compute: libvirtError: Unable to write to '/sys/fs/cgroup/cpuset/machine.slice/machine-qemu\x2dinstance\x2d0002a870.scope/cpuset.cpus': Numerical result out of range
Aug 27 08:19:18 compute02 journal: Unable to write to '/sys/fs/cgroup/cpuset/machine.slice/machine-qemu\x2dinstance\x2d0002a870.scope/cpuset.cpus': Numerical result out of range

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments