During upgrade procedure, live migration was performed to free compute nodes for reboot.
An end result was that a windows instance was running on 2 compute nodes accessing the same backing storage on ceph backend:
[root@compute-1 ~]# virsh list --all | grep instance-000054f5 77 instance-000054f5 running
[root@compute-2 ~]# virsh list --all | grep instance-000054f5 12 instance-000054f5 running
The initial live migration failed and we see from libvirt log the following error:
2016-11-22T17:07:22.131885Z qemu-kvm: VQ 2 size 0x80 < last_avail_idx 0x1 - used_idx 0x2 2016-11-22T17:07:22.132163Z qemu-kvm: error while loading state for instance 0x0 of device '0000:00:05.0/virtio-balloon' 2016-11-22T17:07:22.133659Z qemu-kvm: load of migration failed: Operation not permitted 2016-11-22 17:07:22.137+0000: shutting down 2016-11-22 18:48:55.934+0000: starting up libvirt version: 2.0.0, package: 10.el7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2016-09-21-10:15:26, x86-038.build.eng.bos.redhat.com), qemu version: 2.6.0 (qemu-kvm-rhev-2.6.0-27.el7),
This happened to a windows 2012 instance. This incident caused corruption on the Ceph Volume because two qemu processes wrote to the same volume.
Timeline of what happened and resulted in the issue:
- instance is live migrated
- Instance is migrated with errors, instance is running on old compute node (compute-1) and is shutoff on destination compute node(compute-2).
- User noticed that instance is shutdown and can't be reached. Turned on, now instance is running on both compute nodes.
Instance had to be restored from backup due to data corruption when both active instances wrote to the same backend.
- Red Hat OpenStack Platform 8.0
- not update compute running
- updated compute running
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.