VM scheduling failure - NUMAAffinityFilter

Solution In Progress - Updated -

Issue

  • Overcloud was successfully redeployed. Now we're encountering an issue where until about 70% of CPU pCores allocation, in our "performance" test scenario (SRIOV, CPU pinning, NUMA affinity) - everything seems to work fine. Right above that, VM scheduling is starting to fail by the NUMATopologyFIlter, saying that no more resources are available on the current host. This is obviously false - free physical resources are available.
(overcloud) [stack@undercloud ~]$ openstack flavor show be6b1e46-ceb8-450d-be61-66b937ec2c2c
+----------------------------+-----------------------------------------------------------------------------------------------------------------------------+
| Field                      | Value                                                                                                                       |
+----------------------------+-----------------------------------------------------------------------------------------------------------------------------+
| OS-FLV-DISABLED:disabled   | False                                                                                                                       |
| OS-FLV-EXT-DATA:ephemeral  | 0                                                                                                                           |
| access_project_ids         | None                                                                                                                        |
| disk                       | 40                                                                                                                          |
| id                         | be6b1e46-ceb8-450d-be61-66b937ec2c2c                                                                                        |
| name                       | m1.medium.corepin.numapin                                                                                                   |
| os-flavor-access:is_public | True                                                                                                                        |
| properties                 | hw:cpu_policy='dedicated', hw:emulator_threads_policy='isolate', hw:numa_nodes='4', hw:pci_numa_affinity_policy='preferred' |
| ram                        | 4096                                                                                                                        |
| rxtx_factor                | 1.0                                                                                                                         |
| swap                       |                                                                                                                             |
| vcpus                      | 2                                                                                                                           |
+----------------------------+-----------------------------------------------------------------------------------------------------------------------------+
  • VMs are now failing to be scheduled even though we have remaining memory/cpus cores available:
| fault                               | {u'message': u'Exceeded maximum number of retries. Exceeded max scheduling attempts 3 for instance b88d23bd-61bf-4bb4-b201-8e2ae0b139e4. Last exception: Insufficient compute resources: Requested instance NUMA topology together with requested PCI devices cannot fit the g', u'code': 500, u'details': u'Traceback (most recent call last):\n  File "/usr/lib/python2.7/site-packages/nova/conductor/manager.py", line 604, in build_instances\n    filter_properties, instances[0].uuid)\n  File "/usr/lib/python2.7/site-packages/nova/scheduler/utils.py", line 557, in populate_retry\n    raise exception.MaxRetriesExceeded(reason=msg)\nMaxRetriesExceeded: Exceeded maximum number of retries. Exceeded max scheduling attempts 3 for instance b88d23bd-61bf-4bb4-b201-8e2ae0b139e4. Last exception: Insufficient comput
e resources: Requested instance NUMA topology together with requested PCI devices cannot fit the given host NUMA topology.\n', u'created': u'2019-12-31T15:48:56Z'} |                                              
| flavor                              | m1.medium.corepin.numapin

Environment

  • Red Hat OpenStack Platform 16.0 (RHOSP
  • Red Hat OpenStack Platform 15.0 (RHOSP
  • Red Hat OpenStack Platform 14.0 (RHOSP
  • Red Hat OpenStack Platform 13.0 (RHOSP

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content