Unshelve operation getting failed for some instances due to wrong compute node selection

Solution In Progress - Updated -

Issue

  • Spawn multiple SRIOV instance and perform shelve operation on them. Confirm instances are removed from compute nodes and glance images are created successfully corresponding to shelve operation. Perform the unshelve operation on all instances, some of instances went into ERROR status because they chose the wrong compute node which was not part of SRIOV capable AZ.

  • Unshelve operation on instance was getting failed with following call trace on compute node. Chosen compute node name can be found from nova-scheduler logs from controller nodes. It was trying to spawn on compute node which is not having SRIOV capable nic.

-----------------   From : /var/log/nova/nova-scheduler.log file controller node. -----------------

2016-12-27 07:59:18.027 45111 DEBUG nova.filters [req-9c177f30-fd83-4310-8a93-58c1f833c09e d48c8bdef1a04c10be9ed725d27b1a57 3e6e475787ad4164ab9719c7aa75a477 - - -] Starting with 7 host(s) get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:70
2016-12-27 07:59:18.028 45111 DEBUG nova.filters [req-9c177f30-fd83-4310-8a93-58c1f833c09e d48c8bdef1a04c10be9ed725d27b1a57 3e6e475787ad4164ab9719c7aa75a477 - - -] Filter AggregateInstanceExtraSpecsFilter returned 7 host(s) get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104
2016-12-27 07:59:18.028 45111 DEBUG nova.filters [req-9c177f30-fd83-4310-8a93-58c1f833c09e d48c8bdef1a04c10be9ed725d27b1a57 3e6e475787ad4164ab9719c7aa75a477 - - -] Filter RetryFilter returned 7 host(s) get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104
2016-12-27 07:59:18.029 45111 DEBUG nova.filters [req-9c177f30-fd83-4310-8a93-58c1f833c09e d48c8bdef1a04c10be9ed725d27b1a57 3e6e475787ad4164ab9719c7aa75a477 - - -] Filter AvailabilityZoneFilter returned 7 host(s) get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104
2016-12-27 07:59:18.039 45111 DEBUG nova.filters [req-9c177f30-fd83-4310-8a93-58c1f833c09e d48c8bdef1a04c10be9ed725d27b1a57 3e6e475787ad4164ab9719c7aa75a477 - - -] Filter NUMATopologyFilter returned 7 host(s) get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104
2016-12-27 07:59:18.040 45111 DEBUG nova.filters [req-9c177f30-fd83-4310-8a93-58c1f833c09e d48c8bdef1a04c10be9ed725d27b1a57 3e6e475787ad4164ab9719c7aa75a477 - - -] Filter PciPassthroughFilter returned 7 host(s) get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104
2016-12-27 07:59:18.040 45111 DEBUG nova.filters [req-9c177f30-fd83-4310-8a93-58c1f833c09e d48c8bdef1a04c10be9ed725d27b1a57 3e6e475787ad4164ab9719c7aa75a477 - - -] Filter RamFilter returned 7 host(s) get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104
2016-12-27 07:59:18.041 45111 DEBUG nova.filters [req-9c177f30-fd83-4310-8a93-58c1f833c09e d48c8bdef1a04c10be9ed725d27b1a57 3e6e475787ad4164ab9719c7aa75a477 - - -] Filter ComputeFilter returned 7 host(s) get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104
2016-12-27 07:59:18.041 45111 DEBUG nova.filters [req-9c177f30-fd83-4310-8a93-58c1f833c09e d48c8bdef1a04c10be9ed725d27b1a57 3e6e475787ad4164ab9719c7aa75a477 - - -] Filter ImagePropertiesFilter returned 7 host(s) get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104
2016-12-27 07:59:18.041 45111 DEBUG nova.filters [req-9c177f30-fd83-4310-8a93-58c1f833c09e d48c8bdef1a04c10be9ed725d27b1a57 3e6e475787ad4164ab9719c7aa75a477 - - -] Filter CoreFilter returned 7 host(s) get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104
2016-12-27 07:59:18.042 45111 DEBUG nova.filters [req-9c177f30-fd83-4310-8a93-58c1f833c09e d48c8bdef1a04c10be9ed725d27b1a57 3e6e475787ad4164ab9719c7aa75a477 - - -] Filter ServerGroupAffinityFilter returned 7 host(s) get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104
2016-12-27 07:59:18.042 45111 DEBUG nova.filters [req-9c177f30-fd83-4310-8a93-58c1f833c09e d48c8bdef1a04c10be9ed725d27b1a57 3e6e475787ad4164ab9719c7aa75a477 - - -] Filter ServerGroupAntiAffinityFilter returned 7 host(s) get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104
2016-12-27 07:59:18.043 45111 DEBUG nova.scheduler.filter_scheduler [req-9c177f30-fd83-4310-8a93-58c1f833c09e d48c8bdef1a04c10be9ed725d27b1a57 3e6e475787ad4164ab9719c7aa75a477 - - -] Filtered [(overcloud-compute-4.localdomain, overcloud-compute-4.localdomain) ram:255633 disk:45631488 io_ops:0 instances:0, (overcloud-compute-1.localdomain, overcloud-compute-1.localdomain) ram:255633 disk:45631488 io_ops:0 instances:0, (overcloud-compute-6.localdomain, overcloud-compute-6.localdomain) ram:251601 disk:45631488 io_ops:0 instances:4, (overcloud-compute-5.localdomain, overcloud-compute-5.localdomain) ram:252625 disk:45631488 io_ops:0 instances:3, (overcloud-compute-2.localdomain, overcloud-compute-2.localdomain) ram:251601 disk:45631488 io_ops:0 instances:4, (overcloud-compute-0.localdomain, overcloud-compute-0.localdomain) ram:255633 disk:45631488 io_ops:0 instances:0, (overcloud-compute-3.localdomain, overcloud-compute-3.localdomain) ram:255633 disk:45631488 io_ops:0 instances:0] _schedule /usr/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py:152
2016-12-27 07:59:18.043 45111 DEBUG nova.scheduler.filter_scheduler [req-9c177f30-fd83-4310-8a93-58c1f833c09e d48c8bdef1a04c10be9ed725d27b1a57 3e6e475787ad4164ab9719c7aa75a477 - - -] Weighed [WeighedHost [host: (overcloud-compute-4.localdomain, overcloud-compute-4.localdomain) ram:255633 disk:45631488 io_ops:0 instances:0, weight: 1.0], WeighedHost [host: (overcloud-compute-1.localdomain, overcloud-compute-1.localdomain) ram:255633 disk:45631488 io_ops:0 instances:0, weight: 1.0], WeighedHost [host: (overcloud-compute-0.localdomain, overcloud-compute-0.localdomain) ram:255633 disk:45631488 io_ops:0 instances:0, weight: 1.0], WeighedHost [host: (overcloud-compute-3.localdomain, overcloud-compute-3.localdomain) ram:255633 disk:45631488 io_ops:0 instances:0, weight: 1.0], WeighedHost [host: (overcloud-compute-5.localdomain, overcloud-compute-5.localdomain) ram:252625 disk:45631488 io_ops:0 instances:3, weight: 0.98823313109], WeighedHost [host: (overcloud-compute-6.localdomain, overcloud-compute-6.localdomain) ram:251601 disk:45631488 io_ops:0 instances:4, weight: 0.984227388483], WeighedHost [host: (overcloud-compute-2.localdomain, overcloud-compute-2.localdomain) ram:251601 disk:45631488 io_ops:0 instances:4, weight: 0.984227388483]] _schedule /usr/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py:157
2016-12-27 07:59:18.044 45111 DEBUG nova.scheduler.filter_scheduler [req-9c177f30-fd83-4310-8a93-58c1f833c09e d48c8bdef1a04c10be9ed725d27b1a57 3e6e475787ad4164ab9719c7aa75a477 - - -] Selected host: WeighedHost [host: (overcloud-compute-4.localdomain, overcloud-compute-4.localdomain) ram:255633 disk:45631488 io_ops:0 instances:0, weight: 1.0] _schedule /usr/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py:167

-----------------  From : /var/log/nova/nova-compute.log file of compute node -----------------

2016-12-27 07:59:18.873 6768 ERROR nova.compute.manager [req-9c177f30-fd83-4310-8a93-58c1f833c09e d48c8bdef1a04c10be9ed725d27b1a57 3e6e475787ad4164ab9719c7aa75a477 - - -] [instance: db3f1082-fde5-4430-bb45-dd0621ffff51] Instance failed to spawn
2016-12-27 07:59:18.873 6768 ERROR nova.compute.manager [instance: db3f1082-fde5-4430-bb45-dd0621ffff51] Traceback (most recent call last):
2016-12-27 07:59:18.873 6768 ERROR nova.compute.manager [instance: db3f1082-fde5-4430-bb45-dd0621ffff51]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 4346, in _unshelve_instance
2016-12-27 07:59:18.873 6768 ERROR nova.compute.manager [instance: db3f1082-fde5-4430-bb45-dd0621ffff51]     with rt.instance_claim(context, instance, limits):
2016-12-27 07:59:18.873 6768 ERROR nova.compute.manager [instance: db3f1082-fde5-4430-bb45-dd0621ffff51]   File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 254, in inner
2016-12-27 07:59:18.873 6768 ERROR nova.compute.manager [instance: db3f1082-fde5-4430-bb45-dd0621ffff51]     return f(*args, **kwargs)
2016-12-27 07:59:18.873 6768 ERROR nova.compute.manager [instance: db3f1082-fde5-4430-bb45-dd0621ffff51]   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 173, in instance_claim
2016-12-27 07:59:18.873 6768 ERROR nova.compute.manager [instance: db3f1082-fde5-4430-bb45-dd0621ffff51]     overhead=overhead, limits=limits)
2016-12-27 07:59:18.873 6768 ERROR nova.compute.manager [instance: db3f1082-fde5-4430-bb45-dd0621ffff51]   File "/usr/lib/python2.7/site-packages/nova/compute/claims.py", line 90, in __init__
2016-12-27 07:59:18.873 6768 ERROR nova.compute.manager [instance: db3f1082-fde5-4430-bb45-dd0621ffff51]     self._claim_test(resources, limits)
2016-12-27 07:59:18.873 6768 ERROR nova.compute.manager [instance: db3f1082-fde5-4430-bb45-dd0621ffff51]   File "/usr/lib/python2.7/site-packages/nova/compute/claims.py", line 147, in _claim_test
2016-12-27 07:59:18.873 6768 ERROR nova.compute.manager [instance: db3f1082-fde5-4430-bb45-dd0621ffff51]     "; ".join(reasons))
2016-12-27 07:59:18.873 6768 ERROR nova.compute.manager [instance: db3f1082-fde5-4430-bb45-dd0621ffff51] ComputeResourcesUnavailable: Insufficient compute resources: Claim pci failed..
2016-12-27 07:59:18.873 6768 ERROR nova.compute.manager [instance: db3f1082-fde5-4430-bb45-dd0621ffff51]
2016-12-27 07:59:19.035 6768 ERROR oslo_messaging.rpc.dispatcher [req-9c177f30-fd83-4310-8a93-58c1f833c09e d48c8bdef1a04c10be9ed725d27b1a57 3e6e475787ad4164ab9719c7aa75a477 - - -] Exception during message handling: Insufficient compute resources: Claim pci failed..

Environment

  • Red Hat OpenStack Platform 8.0
  • Red Hat OpenStack Platform 9.0
  • Red Hat OpenStack Platform 10.0 with nova prior to14.0.6-1
  • Instance spawned using SRIOV NIC and multiple AZs are configured.

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In