OpenStack instance can not be deleted after compute node reboot

Environment

Red Hat OpenStack 6.0
ephemeral disk backend is ceph

Issue

We create instances on last remaining compute of an availability zone and rebooting the node. After reboot of this compute, the VMs are in SHUTOFF state, and cannot be deleted anymore. (no nova force-delete nor nova reset-state helps)

Resolution

Ceph cluster was not in a healthy state and requests to delete the storage backend could not be processed. After fixing the ceph cluster it was possible to delete the instances when reboot the last remaining compute.

Diagnostic Steps

The instances in question are :

nova delete  0c45cec5-53fc-43e6-9cb7-bbff6eee5dcd 
nova delete  61119bae-24d9-48ae-a6ed-4bf2918c5196 
nova delete  0700b9f1-a664-40fe-917a-481f387d789f

From nova compute log we can see the delete requests have been considered:

$ grep "Instance destroyed" nova-compute.log
2016-01-14 17:52:02.037 7627 INFO nova.virt.libvirt.driver [-] [instance: 0c45cec5-53fc-43e6-9cb7-bbff6eee5dcd] Instance destroyed successfully.
2016-01-14 17:52:02.147 7627 INFO nova.virt.libvirt.driver [-] [instance: 61119bae-24d9-48ae-a6ed-4bf2918c5196] Instance destroyed successfully.
2016-01-14 17:52:15.965 7627 INFO nova.virt.libvirt.driver [-] [instance: 0c45cec5-53fc-43e6-9cb7-bbff6eee5dcd] Instance destroyed successfully.

more details e.g. for 0700b9f1-a664-40fe-917a-481f387d789f :

2016-01-14 17:52:02.004 7627 DEBUG nova.compute.manager [req-4b5e5357-4e1f-4bc3-be57-78fecbddb01f None] [instance: 0700b9f1-a664-40fe-917a-481f387d789f] Checking state _get_power_state /usr/lib/python2.7/site-packages/nova/compute/manager.py:1217
2016-01-14 17:52:02.005 7627 DEBUG nova.compute.manager [req-4b5e5357-4e1f-4bc3-be57-78fecbddb01f None] [instance: 0700b9f1-a664-40fe-917a-481f387d789f] Stopping instance; current vm_state: active, current task_state: powering-off, current DB power_state: 4, current VM power_state:
 4 do_stop_instance /usr/lib/python2.7/site-packages/nova/compute/manager.py:2606
2016-01-14 17:52:02.006 7627 INFO nova.compute.manager [req-4b5e5357-4e1f-4bc3-be57-78fecbddb01f None] [instance: 0700b9f1-a664-40fe-917a-481f387d789f] Instance is already powered off in the hypervisor when stop is called.
2016-01-14 17:52:02.052 7627 INFO nova.virt.libvirt.driver [req-4b5e5357-4e1f-4bc3-be57-78fecbddb01f None] [instance: 0700b9f1-a664-40fe-917a-481f387d789f] Instance already shutdown.
2016-01-14 17:52:02.055 7627 INFO nova.virt.libvirt.driver [-] [instance: 0700b9f1-a664-40fe-917a-481f387d789f] Instance destroyed successfully.

BUT looking at the libvirt configuration files we can see that for these 3 instances they are still there:

$ grep \<uuid etc/libvirt/qemu/instance-0000003*
etc/libvirt/qemu/instance-00000030.xml:  <uuid>0c45cec5-53fc-43e6-9cb7-bbff6eee5dcd</uuid>
etc/libvirt/qemu/instance-00000033.xml:  <uuid>61119bae-24d9-48ae-a6ed-4bf2918c5196</uuid>
etc/libvirt/qemu/instance-00000036.xml:  <uuid>0700b9f1-a664-40fe-917a-481f387d789f</uuid>

checking the status of the ceph backend we see the ceph cluster is not in a healthy situation. 3 out of 7 osds are down:

$ cat ceph_status 
    cluster 30345036-8571-407f-bce7-7af3c347eb46
     health HEALTH_WARN 128 pgs incomplete; 128 pgs stuck inactive; 128 pgs stuck unclean; 1 requests are blocked > 32 sec
     monmap e1: 3 mons at {controller-1=192.254.100.13:6789/0,controller-2=192.254.100.14:6789/0,controller-3=192.254.100.12:6789/0}, election epoch 10, quorum 0,1,2 controller-3,controller-1,controller-2
     osdmap e119: 7 osds: 4 up, 4 in
      pgmap v577: 576 pgs, 6 pools, 2331 MB data, 1089 objects
            1223 MB used, 1989 GB / 1991 GB avail
                 448 active+clean
                 128 incomplete

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Select Your Language

OpenStack instance can not be deleted after compute node reboot

Environment

Issue

Resolution

Diagnostic Steps

Comments

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Environment

Issue

Resolution

Diagnostic Steps

Comments

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links