Unable to create VM with SRIOV during upgrade from v13 to v16.1
Issue
- We upgraded from RHOSP13 to RHOSP16.1 some days ago on controller nodes. Compute nodes are still pending upgrade. Post-upgrade, currently running VMs continued to function as expected, however on redeploy of any internal environments using SRIOV, stack creation fails with:
Resource CREATE failed: ResourceInError: resources.stratum_adm_rg.resources[8].resources.stratum_adm: Went to status ERROR due to "Message: Exceeded maximum number of retries. Exceeded max scheduling attempts 3 for instance 719468a0-d05b-47fe-bae0-d52343862322. Last exception: Binding failed for port 48924662-d23f-486f-9434-953b51fb90d1, please check neutron logs for more information., Code: 500"
-
Looking into this further, it seems nova hasn't picked up the available SRIOV VFs on these compute nodes. They're configured as expected, but not seeing the usual "nova.compute.resource_tracker" log in /var/log/containers/nova/nova-compute.log to show they've been picked up.
-
Seeing the following error when trying to deploy a VM:
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [req-e4efe4f5-cd23-4b00-90c2-ebfbd288b7e4 77157773674a017edb17ca250b5ffbb97a4b08e09e7de56f7088264b15bdd939 7808a9bf33d64df5b925ae3f695a2ba9 - 8cf3016
a6a3f4b3197fb230609c225f7 8cf3016a6a3f4b3197fb230609c225f7] [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] Instance failed to spawn: nova.exception.PortBindingFailed: Binding failed for port bd0a0f31-
2e3f-47ee-baf2-0752f3b7f26a, please check neutron logs for more information.
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] Traceback (most recent call last):
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2668, in _build_resources
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] yield resources
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2442, in _build_and_run_inst
ance
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] block_device_info=block_device_info)
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 3664, in spawn
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] mdevs=mdevs)
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 6272, in _get_guest_xml
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] network_info_str = str(network_info)
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] File "/usr/lib/python3.6/site-packages/nova/network/model.py", line 616, in __str__
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] return self._sync_wrapper(fn, *args, **kwargs)
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] File "/usr/lib/python3.6/site-packages/nova/network/model.py", line 599, in _sync_wrapper
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] self.wait()
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] File "/usr/lib/python3.6/site-packages/nova/network/model.py", line 631, in wait
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] self[:] = self._gt.wait()
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] File "/usr/lib/python3.6/site-packages/eventlet/greenthread.py", line 181, in wait
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] return self._exit_event.wait()
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] File "/usr/lib/python3.6/site-packages/eventlet/event.py", line 132, in wait
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] current.throw(*self._exc)
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] File "/usr/lib/python3.6/site-packages/eventlet/greenthread.py", line 221, in main
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] result = function(*args, **kwargs)
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] File "/usr/lib/python3.6/site-packages/nova/utils.py", line 675, in context_wrapper
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] return func(*args, **kwargs)
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 1716, in _allocate_network_a
sync
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] six.reraise(*exc_info)
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] File "/usr/lib/python3.6/site-packages/six.py", line 675, in reraise
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] raise value
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 1699, in _allocate_network_a
sync
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] resource_provider_mapping=resource_provider_mapping)
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] File "/usr/lib/python3.6/site-packages/nova/network/neutronv2/api.py", line 1040, in allocate_for_
instance
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] bind_host_id, requested_ports_dict)
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] File "/usr/lib/python3.6/site-packages/nova/network/neutronv2/api.py", line 1169, in _update_ports_for_instance
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] vif.destroy()
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] self.force_reraise()
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] six.reraise(self.type_, self.value, self.tb)
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] File "/usr/lib/python3.6/site-packages/six.py", line 675, in reraise
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] raise value
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] File "/usr/lib/python3.6/site-packages/nova/network/neutronv2/api.py", line 1139, in _update_ports_for_instance
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] port_client, instance, port_id, port_req_body)
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] File "/usr/lib/python3.6/site-packages/nova/network/neutronv2/api.py", line 513, in _update_port
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] _ensure_no_port_binding_failure(port)
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] File "/usr/lib/python3.6/site-packages/nova/network/neutronv2/api.py", line 236, in _ensure_no_port_binding_failure
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] raise exception.PortBindingFailed(port_id=port['id'])
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f] nova.exception.PortBindingFailed: Binding failed for port bd0a0f31-2e3f-47ee-baf2-0752f3b7f26a, please check neutron logs for more information.
2021-11-15 17:46:40.873 8 ERROR nova.compute.manager [instance: ebb64162-6f8d-4cd7-a395-e2b110359f0f]
- We have also seen this:
2021-11-15 20:03:05.336 8 ERROR nova.compute.manager [req-1dbba7fa-379f-415b-be02-2124a3fa6132 - - - - -] [instance: 140a703c-2020-4261-b868-3c880ad7f40f] An error occurred while refreshing the network cache.: neutronclient.common.exceptions.ServiceUnavailable: <html><body><h1>503 Service Unavailable</h1>
- And the following error:
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-ceb23660-ba75-49ea-a214-523dba082d2d - - - - -] Error in agent loop. Devices info: {}: TypeError: can not serialize 'error' object
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent Traceback (most recent call last):
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py", line 473, in daemon_loop
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent device_info = self.scan_devices(devices, updated_devices_copy)
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/osprofiler/profiler.py", line 160, in wrapper
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent result = f(*args, **kwargs)
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py", line 243, in scan_devices
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent curr_devices = self.eswitch_mgr.get_assigned_devices_info()
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 344, in get_assigned_devices_info
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent for device in embedded_switch.get_assigned_devices_info():
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 186, in get_assigned_devices_info
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent mac = self.get_pci_device(pci_slot)
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 297, in get_pci_device
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent macs = self.pci_dev_wrapper.get_assigned_macs([vf_index])
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/pci_lib.py", line 46, in get_assigned_macs
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent vfs = ip.link.get_vfs()
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/neutron/agent/linux/ip_lib.py", line 516, in get_vfs
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent return privileged.get_link_vfs(self.name, self._parent.namespace)
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/neutron/privileged/agent/linux/ip_lib.py", line 73, in sync_inner
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent return input_func(*args, **kwargs)
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/oslo_privsep/priv_context.py", line 245, in _wrap
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent return self.channel.remote_call(name, args, kwargs)
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3.6/site-packages/oslo_privsep/daemon.py", line 224, in remote_call
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent raise exc_type(*result[2])
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent TypeError: can not serialize 'error' object
2021-11-17 11:56:17.412 138623 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent
2021-11-17 11:56:19.257 138623 INFO neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-ceb23660-ba75-49ea-a214-523dba082d2d - - - - -] Agent out of sync with plugin!
2021-11-17 11:56:19.401 156640 WARNING pyroute2.netlink [-] Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/pyroute2/netlink/__init__.py", line 1276, in _ft_decode_generic
self.decode_nlas(offset)
File "/usr/lib/python3.6/site-packages/pyroute2/netlink/__init__.py", line 1401, in decode_nlas
offset)
struct.error: unpack_from requires a buffer of at least 4 bytes
Environment
- Red Hat OpenStack Platform 16.1 (RHOSP)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.