After power cycling, no VMs are able to be created or moved to some computes
Issue
-
Our datacenter suffered from a power outage for more than a hour. After power supply was back on, we observed it was impossible to create or migrate VMs on compute nodes overcloud-compute-0 and overcloud-compute-1.
-
We see the folloeing error messages in
/var/log/containers/nova/nova-compute.log
:
2022-06-13 04:28:33.064 7 ERROR nova.compute.manager [req-a5ca03b4-ea83-41da-a069-8006680c9d18 - - - - -] Error updating resources for node overcloud-compute-0.localdomain.: TimedOut: [errno 110] error connecting to the cluster
2022-06-13 04:28:33.064 7 ERROR nova.compute.manager Traceback (most recent call last):
2022-06-13 04:28:33.064 7 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7573, in update_available_resource_for_node
2022-06-13 04:28:33.064 7 ERROR nova.compute.manager rt.update_available_resource(context, nodename)
2022-06-13 04:28:33.064 7 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 690, in update_available_resource
2022-06-13 04:28:33.064 7 ERROR nova.compute.manager resources = self.driver.get_available_resource(nodename)
2022-06-13 04:28:33.064 7 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 6467, in get_available_resource
2022-06-13 04:28:33.064 7 ERROR nova.compute.manager disk_info_dict = self._get_local_gb_info()
2022-06-13 04:28:33.064 7 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5734, in _get_local_gb_info
2022-06-13 04:28:33.064 7 ERROR nova.compute.manager info = LibvirtDriver._get_rbd_driver().get_pool_info()
2022-06-13 04:28:33.064 7 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/storage/rbd_utils.py", line 368, in get_pool_info
2022-06-13 04:28:33.064 7 ERROR nova.compute.manager with RADOSClient(self) as client:
2022-06-13 04:28:33.064 7 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/storage/rbd_utils.py", line 102, in __init__
2022-06-13 04:28:33.064 7 ERROR nova.compute.manager self.cluster, self.ioctx = driver._connect_to_rados(pool)
2022-06-13 04:28:33.064 7 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/storage/rbd_utils.py", line 133, in _connect_to_rados
2022-06-13 04:28:33.064 7 ERROR nova.compute.manager client.connect()
2022-06-13 04:28:33.064 7 ERROR nova.compute.manager File "rados.pyx", line 885, in rados.Rados.connect (/builddir/build/BUILD/ceph-12.2.12/build/src/pybind/rados/pyrex/rados.c:9785)
2022-06-13 04:28:33.064 7 ERROR nova.compute.manager TimedOut: [errno 110] error connecting to the cluster
2022-06-13 04:28:33.064 7 ERROR nova.compute.manager
2022-06-13 04:28:33.066 7 ERROR oslo.messaging._drivers.impl_rabbit [-] [8808e101-b9b5-469c-81a1-6ebd22fe4de7] AMQP server on overcloud-controller-2.localdomain:5672 is unreachable: [Errno 32] Broken pipe. Trying again in 1 seconds.: error: [Errno 32] Broken pipe
2022-06-13 04:28:33.066 7 ERROR oslo.messaging._drivers.impl_rabbit [-] [29696748-533d-41c2-ae75-18439c4ffe79] AMQP server on overcloud-controller-2.localdomain:5672 is unreachable: [Errno 32] Broken pipe. Trying again in 1 seconds.: error: [Errno 32] Broken pipe
2022-06-13 04:28:33.068 7 ERROR oslo.messaging._drivers.impl_rabbit [-] [2ad8a021-678f-4f4e-9711-537f208b70ba] AMQP server on overcloud-controller-0.localdomain:5672 is unreachable: [Errno 32] Broken pipe. Trying again in 1 seconds.: error: [Errno 32] Broken pipe
Environment
- Red Hat OpenStack Platform 13.0 (RHOSP)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.