error : qemuMonitorIORead:609 : Unable to read from monitor: Connection reset by peer

Solution In Progress - Updated -

Issue

  • Some instances on the compute were in NOSTATE, the libvirt domain for those instances does not exist on the compute but the disk file is still present in /var/lib/nova/instances. Trying to hard reboot gives a libvirt/QEMU error (same when resetting state to active then hard reboot):
{u'message': u'libvirtError', u'code': 500, u'details': u'Traceback (most recent call last):\n  File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 205, in decorated_function\n    return function(self, context, *args, **kwargs)\n  File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3403, in reboot_instance\n    do_reboot_instance(context, instance, block_device_info, reboot_type)\n  File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 274, in inner\n    return f(*args, **kwargs)\n  File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3402, in do_reboot_instance\n    reboot_type)\n  File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3492, in _reboot_instance\n    self._set_instance_obj_error_state(context, instance)\n  File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__\n    self.force_reraise()\n  File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise\n    six.reraise(self.type_, self.value, self.tb)\n  File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3467, in _reboot_instance\n    bad_volumes_callback=bad_volumes_callback)\n  File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2871, in reboot\n    block_device_info)\n  File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2987, in _hard_reboot\n    vifs_already_plugged=True)\n  File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5848, in _create_domain_and_network\n    cleanup_instance_disks=cleanup_instance_disks)\n  File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__\n    self.force_reraise()\n  File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise\n    six.reraise(self.type_, self.value, self.tb)\n  File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5814, in _create_domain_and_network\n    post_xml_callback=post_xml_callback)\n  File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5744, in _create_domain\n    guest.launch(pause=pause)\n  File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 144, in launch\n    self._encoded_xml, errors=\'ignore\')\n  File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__\n    self.force_reraise()\n  File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise\n    six.reraise(self.type_, self.value, self.tb)\n  File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 139, in launch\n    return self._domain.createWithFlags(flags)\n  File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 186, in doit\n    result = proxy_call(self._autowrap, f, *args, **kwargs)\n  File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 144, in proxy_call\n    rv = execute(f, *args, **kwargs)\n  File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 125, in execute\n    six.reraise(c, e, tb)\n  File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 83, in tworker\n    rv = meth(*args, **kwargs)\n  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1110, in createWithFlags\n    if ret == -1: raise libvirtError (\'virDomainCreateWithFlags() failed\', dom=self)\nlibvirtError: internal error: qemu unexpectedly closed the monitor: 2022-05-24T19:35:45.408478Z qemu-kvm: -drive file=/var/lib/nova/mnt/19aaa23bc979a58ee7accf3257988903/volume-d30d29bb-b74f-4d80-aff8-7087718a483c,format=raw,if=none,id=drive-virtio-disk1,serial=d30d29bb-b74f-4d80-aff8-7087718a483c,cache=none,discard=unmap,aio=native: \'serial\' is deprecated, please use the corresponding option of \'-device\' instead\n2022-05-24T19:35:45.411828Z qemu-kvm: -drive file=/var/lib/nova/mnt/19aaa23bc979a58ee7accf3257988903/volume-d30d29bb-b74f-4d80-aff8-7087718a483c,format=raw,if=none,id=drive-virtio-disk1,serial=d30d29bb-b74f-4d80-aff8-7087718a483c,cache=none,discard=unmap,aio=native: Could not find working O_DIRECT alignment\nTry cache.direct=off\n', u'created': u'2022-05-24T19:35:45Z'}
  • We also see this error in /var/lib/messages:
May 24 18:53:58 overcloud-compute-0 dockerd-current[4882]: 2022-05-24 18:53:58.653+0000: 11865: error : qemuProcessReportLogError:1924 : internal error: qemu unexpectedly closed the monitor: 2022-05-24T18:53:58.638190Z qemu
May 24 18:53:58 overcloud-compute-0 dockerd-current[4882]: 2022-05-24 18:53:58.652+0000: 11865: error : qemuMonitorIORead:609 : Unable to read from monitor: Connection reset by peer
  • There's also a network equipment upgrade last week (May 18-19) that impacted the storage connection which we believe is when the issue started. Most of the VMs currently have filesystem issues (but that is another issue that can be dealt with by the tenant, the main problem was the libvirt/QEMU error).

  • We got the tenant to shutdown all the VMs on the compute, then restarted the nova_compute container, then hard rebooted (VMs in error needed to be set state to active) and the VMs appears to be active again.

  • Event list of one VM:

(overcloud) [stack@director ~]$ openstack server event list 08562da5-700b-4aec-bb1c-6ba3eb9181d3
+------------------------------------------+--------------------------------------+--------+----------------------------+
| Request ID                               | Server ID                            | Action | Start Time                 |
+------------------------------------------+--------------------------------------+--------+----------------------------+
| req-52c4adb2-0c16-4602-bac3-32e823f3f46c | 08562da5-700b-4aec-bb1c-6ba3eb9181d3 | reboot | 2022-05-24T20:12:43.000000 |
| req-63cdcc75-7869-47a0-aa4d-0e4267fd53eb | 08562da5-700b-4aec-bb1c-6ba3eb9181d3 | reboot | 2022-05-24T18:53:52.000000 |
| req-410a4c2d-ae55-47f1-8c23-7d8a933a70cc | 08562da5-700b-4aec-bb1c-6ba3eb9181d3 | reboot | 2022-05-24T18:49:40.000000 |
| req-9d84d86b-fbb8-4015-abf2-02ad7122658c | 08562da5-700b-4aec-bb1c-6ba3eb9181d3 | start  | 2022-05-24T18:36:17.000000 |
| req-2e52c877-d2a1-45eb-8669-082c682d20e8 | 08562da5-700b-4aec-bb1c-6ba3eb9181d3 | start  | 2022-05-24T18:33:59.000000 |
| req-85346873-7401-4a9a-bc8d-7a7cf8e9335d | 08562da5-700b-4aec-bb1c-6ba3eb9181d3 | start  | 2022-05-24T18:04:32.000000 |
| req-076915d3-0b95-4770-9e38-c4531c3ab74a | 08562da5-700b-4aec-bb1c-6ba3eb9181d3 | start  | 2022-05-24T17:55:31.000000 |
| req-9062e440-d1c3-4760-9449-80b09d56452c | 08562da5-700b-4aec-bb1c-6ba3eb9181d3 | stop   | 2022-05-24T17:54:22.000000 |
| req-95f8d825-5bd7-40c6-b8a9-2a2f46781a24 | 08562da5-700b-4aec-bb1c-6ba3eb9181d3 | create | 2022-05-16T18:22:37.000000 |
+------------------------------------------+--------------------------------------+--------+----------------------------+
  • Can you folks help me figure out what happened? We referred to this article which got us to try restarting nova_compute but we didn't need to make changes to nova.conf so it might not be the same issue.

Environment

  • Red Hat OpenStack Platform 13.0 (RHOSP)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content