VM went into ERROR state after compute node reboot
Issue
- One out of two VMs on SRIOV network with Intel 25G NICs went into ERROR state after the compute node was rebooted as part of CVIM update:
+--------------------------------------+------------+--------+---------------------------------+------------------+-----------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+------------+--------+---------------------------------+------------------+-----------+
| 609ecc91-f740-41e2-885c-ad1a47eb5628 | sriov-vm-7 | ERROR | prov-sriov=10.8.100.76 | RHEL-guest-image | m1.medium |
| 5f6d76e0-165f-486f-8870-2cb98cefee22 | sriov-vm-6 | ACTIVE | prov-sriov=10.8.100.69 | RHEL-guest-image | m1.medium |
- Seeing below error for that VM in nova logs:
nova/nova-compute.log:51205:2022-04-20 21:40:27.479 9 ERROR nova.virt.libvirt.driver [req-d5f2d5a5-fac4-4fce-bccf-10b476109584 - - - - -] [instance: 609ecc91-f740-41e2-885c-ad1a47eb5628] Failed to start libvirt guest: libvirt.libvirtError: internal error: process exited while connecting to monitor: 2022-04-21T04:40:27.114214Z qemu-kvm: -device cirrus-vga,id=video0,bus=pci.0,addr=0x2: warning: 'cirrus-vga' is deprecated, please use a different VGA card instead
- Saw a RedHat solution for similar [issue](https://access.redhat.com/solutions/5253171) but the VM is not even getting listed as part of virsh list in that compute node.
- We also see the following message in the libvirt logs:
"qemu-kvm: -device vfio-pci,host=0000:d8:02.5,id=hostdev0,bus=pci.0,addr=0x4: vfio 0000:d8:02.5: failed to get group 88 status: No such device or address"
~~~
Environment
- Red Hat OpenStack Platform 16.1 (RHOSP)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.