Unable to create VM with SRIOV during upgrade from v13 to v16.1

Solution In Progress - Updated -

Issue

  • We upgraded from RHOSP13 to RHOSP16.1 some days ago on controller nodes. Compute nodes are still pending upgrade. Post-upgrade, currently running VMs continued to function as expected, however on redeploy of any internal environments using SRIOV, stack creation fails with:
Resource CREATE failed: ResourceInError: resources.stratum_adm_rg.resources[8].resources.stratum_adm: Went to status ERROR due to "Message: Exceeded maximum number of retries. Exceeded max scheduling attempts 3 for instance 719468a0-d05b-47fe-bae0-d52343862322. Last exception: Binding failed for port 48924662-d23f-486f-9434-953b51fb90d1, please check neutron logs for more information., Code: 500"
  • Looking into this further, it seems nova hasn't picked up the available SRIOV VFs on these compute nodes. They're configured as expected, but not seeing the usual "nova.compute.resource_tracker" log in /var/log/containers/nova/nova-compute.log to show they've been picked up.

  • Seeing the following error when trying to deploy a VM:

2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [req-e4efe4f5-cd23-4b00-90c2-ebfbd288b7e4 77157773674a017edb17ca250b5ffbb97a4b08e09e7de56f7088264b15bdd939 7808a9bf33d64df5b925ae3f695a2ba9 - 8cf3016
a6a3f4b3197fb230609c225f7 8cf3016a6a3f4b3197fb230609c225f7] [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] Instance failed to spawn: nova.exception.PortBindingFailed: Binding failed for port bd0a0f31-
2e3f-47ee-baf2-0752f3b7f26a, please check neutron logs for more information.
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] Traceback (most recent call last):
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]   File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2668, in _build_resources
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]     yield resources
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]   File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2442, in _build_and_run_inst
ance
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]     block_device_info=block_device_info)
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]   File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 3664, in spawn
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]     mdevs=mdevs)
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]   File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 6272, in _get_guest_xml
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]     network_info_str = str(network_info)
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]   File "/usr/lib/python3.6/site-packages/nova/network/model.py", line 616, in __str__
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]     return self._sync_wrapper(fn, *args, **kwargs)
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]   File "/usr/lib/python3.6/site-packages/nova/network/model.py", line 599, in _sync_wrapper
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]     self.wait()
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]   File "/usr/lib/python3.6/site-packages/nova/network/model.py", line 631, in wait
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]     self[:] = self._gt.wait()
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]   File "/usr/lib/python3.6/site-packages/eventlet/greenthread.py", line 181, in wait
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]     return self._exit_event.wait()
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]   File "/usr/lib/python3.6/site-packages/eventlet/event.py", line 132, in wait
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]     current.throw(*self._exc)
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]   File "/usr/lib/python3.6/site-packages/eventlet/greenthread.py", line 221, in main
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]     result = function(*args, **kwargs)
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]   File "/usr/lib/python3.6/site-packages/nova/utils.py", line 675, in context_wrapper
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]     return func(*args, **kwargs)
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]   File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 1716, in _allocate_network_a
sync
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]     six.reraise(*exc_info)
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]   File "/usr/lib/python3.6/site-packages/six.py", line 675, in reraise
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]     raise value
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]   File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 1699, in _allocate_network_a
sync
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]     resource_provider_mapping=resource_provider_mapping)
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]   File "/usr/lib/python3.6/site-packages/nova/network/neutronv2/api.py", line 1040, in allocate_for_
instance
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]     bind_host_id, requested_ports_dict)
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]   File "/usr/lib/python3.6/site-packages/nova/network/neutronv2/api.py", line 1169, in _update_ports_for_instance
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]     vif.destroy()
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]   File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]     self.force_reraise()
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]   File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]     six.reraise(self.type_, self.value, self.tb)
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]   File "/usr/lib/python3.6/site-packages/six.py", line 675, in reraise
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]     raise value
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]   File "/usr/lib/python3.6/site-packages/nova/network/neutronv2/api.py", line 1139, in _update_ports_for_instance
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]     port_client, instance, port_id, port_req_body)
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]   File "/usr/lib/python3.6/site-packages/nova/network/neutronv2/api.py", line 513, in _update_port
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]     _ensure_no_port_binding_failure(port)
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]   File "/usr/lib/python3.6/site-packages/nova/network/neutronv2/api.py", line 236, in _ensure_no_port_binding_failure
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]     raise exception.PortBindingFailed(port_id=port['id'])
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] nova.exception.PortBindingFailed: Binding failed for port bd0a0f31-2e3f-47ee-baf2-0752f3b7f26a, please check neutron logs for more information.
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]
  • We have also seen this:
2021-11-15 20:03:05.336 8 ERROR nova.compute.manager [req-1dbba7fa-379f-415b-be02-2124a3fa6132 - - - - -] [instance: 140a703c-2020-4261-b868-3c880ad7f40f] An error occurred while refreshing the network cache.: neutronclient.common.exceptions.ServiceUnavailable: <html><body><h1>503 Service Unavailable</h1>
  • And the following error:
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-ceb23660-ba75-49ea-a214-523dba082d2d - - - - -] Error in agent loop. Devices info: {}: TypeError: can not serialize 'error' object
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent Traceback (most recent call last):
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py", line 473, in daemon_loop
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     device_info = self.scan_devices(devices, updated_devices_copy)
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/usr/lib/python3.6/site-packages/osprofiler/profiler.py", line 160, in wrapper
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     result = f(*args, **kwargs)
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py", line 243, in scan_devices
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     curr_devices = self.eswitch_mgr.get_assigned_devices_info()
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 344, in get_assigned_devices_info
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     for device in embedded_switch.get_assigned_devices_info():
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 186, in get_assigned_devices_info
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     mac = self.get_pci_device(pci_slot)
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 297, in get_pci_device
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     macs = self.pci_dev_wrapper.get_assigned_macs([vf_index])
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/pci_lib.py", line 46, in get_assigned_macs
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     vfs = ip.link.get_vfs()
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/usr/lib/python3.6/site-packages/neutron/agent/linux/ip_lib.py", line 516, in get_vfs
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     return privileged.get_link_vfs(self.name, self._parent.namespace)
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/usr/lib/python3.6/site-packages/neutron/privileged/agent/linux/ip_lib.py", line 73, in sync_inner
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     return input_func(*args, **kwargs)
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/usr/lib/python3.6/site-packages/oslo_privsep/priv_context.py", line 245, in _wrap
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     return self.channel.remote_call(name, args, kwargs)
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/usr/lib/python3.6/site-packages/oslo_privsep/daemon.py", line 224, in remote_call
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     raise exc_type(*result[2])
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent TypeError: can not serialize 'error' object
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent
2021-11-17 11:56:19.257 138623 INFO neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-ceb23660-ba75-49ea-a214-523dba082d2d - - - - -] Agent out of sync with plugin!
2021-11-17 11:56:19.401 156640 WARNING pyroute2.netlink [-] Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/pyroute2/netlink/__init__.py", line 1276, in _ft_decode_generic
    self.decode_nlas(offset)
  File "/usr/lib/python3.6/site-packages/pyroute2/netlink/__init__.py", line 1401, in decode_nlas
    offset)
struct.error: unpack_from requires a buffer of at least 4 bytes

Environment

  • Red Hat OpenStack Platform 16.1 (RHOSP)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content