VM down in multiple compute
Issue
-
We are observing multiple VMs down in different compute nodes and we have tried to reboot the problematic computes but issue remains.
-
We have also tried server power on/off but still VMs are not restoring:
(overcloud) [stack@invikl08klm1dirx01mv ~]$ openstack server show 23d3d0e4-db1a-41ff-9a72-e6f2a8bddc75 --fit
+-------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+-------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | comp6 |
| OS-EXT-SRV-ATTR:host | overcloud-compute-0.localdomain |
| OS-EXT-SRV-ATTR:hypervisor_hostname | overcloud-compute-0.localdomain |
| OS-EXT-SRV-ATTR:instance_name | instance-00000026 |
| OS-EXT-STS:power_state | NOSTATE |
| OS-EXT-STS:task_state | None |
| OS-EXT-STS:vm_state | error |
| OS-SRV-USG:launched_at | 2020-09-30T19:15:40.000000 |
| OS-SRV-USG:terminated_at | None |
| accessIPv4 | |
| accessIPv6 | |
| addresses | sriov_eno2_net_Central_613=172.150.128.3; sriov_eno1_net_Central_613=172.150.128.9; |
| | sriov_eno1_net_STL_IPC_internal_209=172.150.146.3; sriov_eno1_net_Mgmt_Test_606=10.103.254.5; |
| | sriov_eno2_net_STL_IPC_internal_209=172.150.146.12; sriov_eno2_net_Mgmt_Test_606=10.103.254.12; |
| | sriov_eno1_net_Mediation_RA_612=172.150.127.19; sriov_eno2_net_Mediation_RA_612=172.150.127.11 |
| config_drive | |
| created | 2020-09-30T19:15:10Z |
| fault | {u'message': u'libvirtError', u'code': 500, u'details': u'Traceback (most recent call last):\n File "/usr/lib/python2.7/site- |
| | packages/nova/compute/manager.py", line 202, in decorated_function\n return function(self, context, *args, **kwargs)\n |
| | File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3282, in reboot_instance\n |
| | self._set_instance_obj_error_state(context, instance)\n File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line |
| | 220, in __exit__\n self.force_reraise()\n File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in |
| | force_reraise\n six.reraise(self.type_, self.value, self.tb)\n File "/usr/lib/python2.7/site- |
| | packages/nova/compute/manager.py", line 3257, in reboot_instance\n bad_volumes_callback=bad_volumes_callback)\n File |
| | "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2725, in reboot\n block_device_info)\n File |
| | "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2841, in _hard_reboot\n vifs_already_plugged=True)\n |
| | File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5663, in _create_domain_and_network\n |
| | destroy_disks_on_failure)\n File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__\n |
| | self.force_reraise()\n File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise\n |
| | six.reraise(self.type_, self.value, self.tb)\n File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line |
| | 5632, in _create_domain_and_network\n post_xml_callback=post_xml_callback)\n File "/usr/lib/python2.7/site- |
| | packages/nova/virt/libvirt/driver.py", line 5567, in _create_domain\n guest.launch(pause=pause)\n File "/usr/lib/python2.7 |
| | /site-packages/nova/virt/libvirt/guest.py", line 144, in launch\n self._encoded_xml, errors=\'ignore\')\n File |
| | "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__\n self.force_reraise()\n File |
| | "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise\n six.reraise(self.type_, self.value, |
| | self.tb)\n File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 139, in launch\n return |
| | self._domain.createWithFlags(flags)\n File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 186, in doit\n |
| | result = proxy_call(self._autowrap, f, *args, **kwargs)\n File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line |
| | 144, in proxy_call\n rv = execute(f, *args, **kwargs)\n File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line |
| | 125, in execute\n six.reraise(c, e, tb)\n File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 83, in tworker\n |
| | rv = meth(*args, **kwargs)\n File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1110, in createWithFlags\n if ret |
| | == -1: raise libvirtError (\'virDomainCreateWithFlags() failed\', dom=self)\nlibvirt
**Error: internal error: couldn\'t find |
| | IFLA_VF_INFO for VF 18 in netlink response\n', u'created': u'2021-08-30T15:13:17Z'} **
-
- journalctl -x
returns error similar to those:
Aug 30 12:33:22 overcloud-compute-0 kernel: iavf 0000:86:0d.5: enabling device (0000 -> 0002)
Aug 30 12:33:22 overcloud-compute-0 kernel: iavf 0000:d8:05.4: enabling device (0000 -> 0002)
Aug 30 12:33:22 overcloud-compute-0 kernel: iavf 0000:86:0b.2: enabling device (0000 -> 0002)
Aug 30 12:33:22 overcloud-compute-0 kernel: iavf 0000:d8:04.5: enabling device (0000 -> 0002)
Aug 30 12:33:22 overcloud-compute-0 kernel: iavf 0000:86:0a.7: enabling device (0000 -> 0002)
Aug 30 12:33:22 overcloud-compute-0 kernel: iavf 0000:d8:02.5: enabling device (0000 -> 0002)
Aug 30 12:33:22 overcloud-compute-0 kernel: iavf 0000:86:0a.2: enabling device (0000 -> 0002)
Aug 30 12:33:22 overcloud-compute-0 kernel: iavf 0000:d8:02.0: enabling device (0000 -> 0002)
Aug 30 12:33:22 overcloud-compute-0 kernel: DMAR: 64bit 0000:d8:02.0 uses identity mapping
Aug 30 12:33:22 overcloud-compute-0 dbus[8391]: [system] Activating via systemd: service name='org.freedesktop.machine1' unit='dbus-org.freedesktop.machine1.service'
Aug 30 12:33:22 overcloud-compute-0 dbus[8391]: [system] Activation via systemd failed for unit 'dbus-org.freedesktop.machine1.service': Refusing activation, D-Bus is shutting down.
Aug 30 12:33:22 overcloud-compute-0 dockerd-current[17200]: 2021-08-30 07:03:22.101+0000: 836241: error : virSystemdTerminateMachine:444 : Refusing activation, D-Bus is shutting down.
Aug 30 12:33:22 overcloud-compute-0 kernel: DMAR: 64bit 0000:86:0b.2 uses identity mapping
Aug 30 12:33:22 overcloud-compute-0 kernel: DMAR: 64bit 0000:86:0a.2 uses identity mapping
Aug 30 12:33:22 overcloud-compute-0 kernel: DMAR: 64bit 0000:d8:05.4 uses identity mapping
Aug 30 12:33:22 overcloud-compute-0 kernel: DMAR: 64bit 0000:86:0d.5 uses identity mapping
Aug 30 12:33:22 overcloud-compute-0 kernel: DMAR: 64bit 0000:d8:04.5 uses identity mapping
Aug 30 12:33:22 overcloud-compute-0 kernel: DMAR: 64bit 0000:d8:02.5 uses identity mapping
Aug 30 12:33:22 overcloud-compute-0 kernel: DMAR: 64bit 0000:86:0a.7 uses identity mapping
Aug 30 12:33:22 overcloud-compute-0 kernel: iavf 0000:d8:02.0: irq 1091 for MSI/MSI-X
Aug 30 12:33:22 overcloud-compute-0 kernel: iavf 0000:d8:02.0: irq 1092 for MSI/MSI-X
Aug 30 12:33:22 overcloud-compute-0 kernel: iavf 0000:d8:02.0: irq 1093 for MSI/MSI-X
Aug 30 12:33:22 overcloud-compute-0 kernel: iavf 0000:d8:02.0: irq 1094 for MSI/MSI-X
Aug 30 12:33:22 overcloud-compute-0 kernel: iavf 0000:d8:02.0: irq 1095 for MSI/MSI-X
Aug 30 12:33:22 overcloud-compute-0 kernel: iavf 0000:d8:02.0: Multiqueue Enabled: Queue pair count = 4
Aug 30 12:33:22 overcloud-compute-0 kernel: iavf 0000:d8:02.0: MAC address: ee:95:ac:56:af:66
Aug 30 12:33:22 overcloud-compute-0 kernel: iavf 0000:d8:02.0: GRO is enabled
Aug 30 12:33:22 overcloud-compute-0 NetworkManager[14475]: <info> [1630307002.1646] manager: (eth0): new Ethernet device (/org/freedesktop/NetworkManager/Devices/341)
Aug 30 12:33:22 overcloud-compute-0 kernel: iavf 0000:86:0b.2: irq 1096 for MSI/MSI-X
Aug 30 12:33:22 overcloud-compute-0 kernel: iavf 0000:86:0b.2: irq 1097 for MSI/MSI-X
Aug 30 12:33:22 overcloud-compute-0 kernel: iavf 0000:86:0b.2: irq 1098 for MSI/MSI-X
Aug 30 12:33:22 overcloud-compute-0 kernel: iavf 0000:86:0b.2: irq 1099 for MSI/MSI-X
Aug 30 12:33:22 overcloud-compute-0 kernel: iavf 0000:86:0b.2: irq 1100 for MSI/MSI-X
Aug 30 12:33:22 overcloud-compute-0 kernel: iavf 0000:86:0b.2: Multiqueue Enabled: Queue pair count = 4
Aug 30 12:33:22 overcloud-compute-0 NetworkManager[14475]: <info> [1630307002.4337] device (eth0): interface index 341 renamed iface from 'eth0' to 'enp216s2'
Aug 30 12:33:22 overcloud-compute-0 kernel: iavf 0000:86:0b.2: MAC address: 26:cc:90:57:ca:d8
Aug 30 12:33:22 overcloud-compute-0 kernel: iavf 0000:86:0b.2: GRO is enabled
Aug 30 12:33:22 overcloud-compute-0 kernel: iavf 0000:86:0a.2: irq 1111 for MSI/MSI-X
Aug 30 12:33:22 overcloud-compute-0 kernel: iavf 0000:86:0a.2: irq 1112 for MSI/MSI-X
Aug 30 12:33:22 overcloud-compute-0 kernel: iavf 0000:86:0a.2: irq 1113 for MSI/MSI-X
Aug 30 12:33:22 overcloud-compute-0 kernel: iavf 0000:86:0a.2: irq 1114 for MSI/MSI-X
Aug 30 12:33:22 overcloud-compute-0 kernel: iavf 0000:86:0a.2: irq 1115 for MSI/MSI-X
Aug 30 12:33:22 overcloud-compute-0 kernel: iavf 0000:86:0a.2: Multiqueue Enabled: Queue pair count = 4
...skipping...
Aug 30 12:41:26 overcloud-compute-0 kernel: iavf 0000:1a:09.0: Admin queue command never completed
Aug 30 12:41:25 overcloud-compute-0 kernel: i40e 0000:1a:00.1: add vsi failed for VF 2, aq_err 0
Aug 30 12:41:25 overcloud-compute-0 kernel: i40e 0000:1a:00.0: add vsi failed, err I40E_ERR_ADMIN_QUEUE_TIMEOUT aq_err OK
Aug 30 12:41:25 overcloud-compute-0 kernel: i40e 0000:1a:00.0: add vsi failed for VF 1, aq_err 0
Aug 30 12:41:25 overcloud-compute-0 kernel: i40e 0000:1a:00.1: Set default VSI failed, err I40E_ERR_ADMIN_QUEUE_TIMEOUT, aq_err OK
Aug 30 12:41:25 overcloud-compute-0 kernel: i40e 0000:1a:00.1: Setting promiscuous on failed on PF, err I40E_ERR_ADMIN_QUEUE_TIMEOUT aq_err OK
Aug 30 12:41:25 overcloud-compute-0 kernel: iavf 0000:1a:06.4: Device is still in reset (-16), retrying
Aug 30 12:41:25 overcloud-compute-0 kernel: iavf 0000:1a:06.2: Admin queue command never completed
Aug 30 12:41:26 overcloud-compute-0 kernel: i40e 0000:1a:00.0: VF reset check timeout on VF 1
Aug 30 12:41:26 overcloud-compute-0 kernel: iavf 0000:1a:09.0: Admin queue command never completed
Aug 30 12:41:26 overcloud-compute-0 kernel: i40e 0000:1a:00.0: add vsi failed, err I40E_ERR_ADMIN_QUEUE_TIMEOUT aq_err OK
Aug 30 12:41:26 overcloud-compute-0 kernel: i40e 0000:1a:00.0: add vsi failed for VF 1, aq_err 0
Aug 30 12:41:26 overcloud-compute-0 kernel: iavf 0000:1a:05.2: Admin queue command never completed
Aug 30 12:41:26 overcloud-compute-0 kernel: i40e 0000:1a:00.0: VF 6 failed to set multicast promiscuous mode err I40E_ERR_ADMIN_QUEUE_TIMEOUT aq_err OK
Aug 30 12:41:26 overcloud-compute-0 kernel: iavf 0000:1a:05.3: Admin queue command never completed
Aug 30 12:41:26 overcloud-compute-0 kernel: i40e 0000:1a:00.0: Error OK, forcing overflow promiscuous on VF 6
Aug 30 12:41:27 overcloud-compute-0 kernel: iavf 0000:1a:05.1: Admin queue command never completed
Aug 30 12:41:27 overcloud-compute-0 kernel: i40e 0000:1a:00.0: ignoring delete macvlan error on VF 6, err I40E_ERR_ADMIN_QUEUE_TIMEOUT, aq_err OK
Aug 30 12:41:27 overcloud-compute-0 kernel: iavf 0000:1a:07.4: Admin queue command never completed
Aug 30 12:41:27 overcloud-compute-0 kernel: i40e 0000:1a:00.1: VSI seid 440 Rx ring 145 disable timeout
Aug 30 12:41:27 overcloud-compute-0 kernel: i40e 0000:1a:00.0: add vsi failed, err I40E_ERR_ADMIN_QUEUE_TIMEOUT aq_err OK
Aug 30 12:41:27 overcloud-compute-0 kernel: i40e 0000:1a:00.0: add vsi failed for VF 6, aq_err 0
Aug 30 12:41:27 overcloud-compute-0 kernel: i40e 0000:1a:00.0: VSI seid 425 Rx ring 213 disable timeout
Aug 30 12:41:27 overcloud-compute-0 kernel: i40e 0000:1a:00.1: VF 4 failed to set multicast promiscuous mode err I40E_ERR_ADMIN_QUEUE_TIMEOUT aq_err OK
Aug 30 12:41:28 overcloud-compute-0 kernel: i40e 0000:1a:00.1: Error OK, forcing overflow promiscuous on VF 4
Aug 30 12:41:28 overcloud-compute-0 kernel: i40e 0000:1a:00.0: VF 21 failed to set multicast promiscuous mode err I40E_ERR_ADMIN_QUEUE_TIMEOUT aq_err OK
Aug 30 12:41:28 overcloud-compute-0 kernel: i40e 0000:1a:00.1: ignoring delete macvlan error on VF 4, err I40E_ERR_ADMIN_QUEUE_TIMEOUT, aq_err OK
Aug 30 12:41:28 overcloud-compute-0 kernel: i40e 0000:1a:00.0: Error OK, forcing overflow promiscuous on VF 21
Aug 30 12:41:28 overcloud-compute-0 kernel: i40e 0000:1a:00.0: ignoring delete macvlan error on VF 21, err I40E_ERR_ADMIN_QUEUE_TIMEOUT, aq_err OK
Aug 30 12:41:28 overcloud-compute-0 kernel: i40e 0000:1a:00.1: add vsi failed, err I40E_ERR_ADMIN_QUEUE_TIMEOUT aq_err OK
Aug 30 12:41:28 overcloud-compute-0 kernel: i40e 0000:1a:00.1: add vsi failed for VF 4, aq_err 0
Aug 30 12:41:28 overcloud-compute-0 kernel: iavf 0000:1a:06.2: Admin queue command never completed
Aug 30 12:41:29 overcloud-compute-0 kernel: i40e 0000:1a:00.0: add vsi failed, err I40E_ERR_ADMIN_QUEUE_TIMEOUT aq_err OK
Aug 30 12:41:29 overcloud-compute-0 kernel: i40e 0000:1a:00.0: add vsi failed for VF 21, aq_err 0
Aug 30 12:41:29 overcloud-compute-0 kernel: iavf 0000:1a:09.0: Admin queue command never completed
Aug 30 12:41:29 overcloud-compute-0 kernel: i40e 0000:1a:00.0: VSI seid 435 Rx ring 253 disable timeout
Aug 30 12:41:29 overcloud-compute-0 kernel: iavf 0000:1a:06.4: Admin queue command never completed
Aug 30 12:41:29 overcloud-compute-0 kernel: i40e 0000:1a:00.0: VF 31 failed to set multicast promiscuous mode err I40E_ERR_ADMIN_QUEUE_TIMEOUT aq_err OK
Aug 30 12:41:29 overcloud-compute-0 kernel: iavf 0000:1a:07.4: Admin queue command never completed
Aug 30 12:41:29 overcloud-compute-0 kernel: i40e 0000:1a:00.0: Error OK, forcing overflow promiscuous on VF 31
Aug 30 12:41:29 overcloud-compute-0 kernel: iavf 0000:1a:05.1: Admin queue command never completed
Aug 30 12:41:30 overcloud-compute-0 kernel: i40e 0000:1a:00.0: ignoring delete macvlan error on VF 31, err I40E_ERR_ADMIN_QUEUE_TIMEOUT, aq_err OK
Aug 30 12:41:30 overcloud-compute-0 kernel: iavf 0000:1a:05.3: Admin queue command never completed
Aug 30 12:41:30 overcloud-compute-0 kernel: iavf 0000:1a:05.2: Admin queue command never completed
Aug 30 12:41:30 overcloud-compute-0 kernel: i40e 0000:1a:00.0: add vsi failed, err I40E_ERR_ADMIN_QUEUE_TIMEOUT aq_err OK
Aug 30 12:41:30 overcloud-compute-0 kernel: i40e 0000:1a:00.0: add vsi failed for VF 31, aq_err 0
Aug 30 12:41:30 overcloud-compute-0 kernel: i40e 0000:1a:00.1: VSI seid 441 Rx ring 149 disable timeout
Aug 30 12:41:30 overcloud-compute-0 kernel: i40e 0000:1a:00.0: VF reset check timeout on VF 1
Aug 30 12:41:31 overcloud-compute-0 kernel: i40e 0000:1a:00.1: VF 5 failed to set multicast promiscuous mode err I40E_ERR_ADMIN_QUEUE_TIMEOUT aq_err OK
Aug 30 12:41:31 overcloud-compute-0 kernel: i40e 0000:1a:00.0: add vsi failed, err I40E_ERR_ADMIN_QUEUE_TIMEOUT aq_err OK
ip link show dev eno1
is missing some VFs:
[overcloud-compute-0 ~]# ip link show eno2
3: eno2: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 84:13:9f:31:ac:54 brd ff:ff:ff:ff:ff:ff
vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
vf 3 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
vf 5 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
vf 7 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
vf 8 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
vf 9 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
vf 10 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
vf 11 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
vf 13 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
vf 14 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
vf 15 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
vf 16 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
vf 17 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
vf 18 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
vf 19 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
vf 20 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
vf 22 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
vf 23 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
vf 24 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
vf 25 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
vf 26 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
vf 27 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
vf 28 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
vf 29 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
[root@overcloud-compute-0 ~]# exit
Environment
- Red Hat OpenStack Platform 13.0 (RHOSP)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.