RHV VM fails to start due to a corrupt qcow image
Environment
- Red Hat Virtualization (RHV) 4.1
- Red Hat Enterprise Linux (RHEL) 7.4 host
- vdsm-4.19.31-1.el7ev.x86_64
- libvirt-daemon-3.2.0-14.el7_4.3.x86_64
- qemu-kvm-rhev-2.9.0-16.el7_4.5.x86_64
Issue
-
One of our VM servers is in down status. We couldn't start it.
-
The error reported in the RHV Admin Portal is;
VM xyz is down with error. Exit message: internal error: process exited while connecting to monitor
- The error reported in the engine logs is;
2018-05-21T14:22:52.365657Z qemu-kvm: -drive file=/rhev/data-center/083df2f6-2a0d-4eca-a55c-a3d329374a42/a45c58ef-66d7-4bbf-b754-43c48842a6db/images/170b40d2-03eb-4506-83b2-707ac56e3809/55a11516-23e2-4ebf-972e-4aa46aec09bb,format=qcow2,if=none,id=drive-virtio-disk0,serial=170b40d2-03eb-4506-83b2-707ac56e3809,cache=none,werror=stop,rerror=stop,aio=native: qcow2: Image is corrupt; cannot be opened read/write.
- The error reported in the vdsm logs contains;
2018-05-21 15:42:51,836+0300 ERROR (periodic/36) [virt.vm] (vmId='abfa1494-ac27-4394-b358-78bf92b58de0') Improbable extension request for volume 55a11516-23e2-4ebf-972e-4aa46aec09bb on domain a45c58ef-66d7-4bbf-b754-43c48842a6db, pausing the VM to avoid corruptions (capacity: 0, allocated: 18267308032, physical: 47244640256, next physical size: 0) (vm:880)
Resolution
-
The root cause has not been found at this time.
-
We know that the qcow image was corrupt. However, without seeing the corrupted image, we cannot make any further determination.
-
The symptoms were;
- VM paused
- VDSM reported "Improbable extension request for volume ........ pausing the VM to avoid corruptions"
- 'qemu-img info' reported capacity = 0
- 'qemu-img check' failed with "Cannot allocate memory"
-
All of the above can occur if a qcow image is artificially overwritten.
-
We added the information for this specific instance to a bug that had already been opened for a similar problem, BZ 1583346. Unfortunately, a root cause was not able to be determined from the information provided.
Root Cause
-
The qcow image is corrupt.
-
The physical size is 44gb, but the qcow header virtual size is 250gb.
Diagnostic Steps
The corrupt bit in the qcow header is set to true and the virtual size is 250gb;
# qemu-img info /rhev/data-center/083df2f6-2a0d-4eca-a55c-a3d329374a42/a45c58ef-66d7-4bbf-b754-43c48842a6db/images/170b40d2-03eb-4506-83b2-707ac56e3809/55a11516-23e2-4ebf-972e-4aa46aec09bb
file format: qcow2
virtual size: 250G (268435456000 bytes)
disk size: 0
cluster_size: 65536
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: true
qemu-img checkreportsCheck failed: Cannot allocate memory;
# qemu-img check /rhev/data-center/083df2f6-2a0d-4eca-a55c-a3d329374a42/a45c58ef-66d7-4bbf-b754-43c48842a6db/images/170b40d2-03eb-4506-83b2-707ac56e3809/55a11516-23e2-4ebf-972e-4aa46aec09bb
tcmalloc: large alloc 1571991715840 bytes == (nil) @ 0x7fa1ce4d8bf3 0x7fa1ce4fa488 0x7fa1cdffcc56 0x55ab98294d21 0x55ab98295003 0x55ab98295316 0x55ab982955ed 0x55ab98298106 0x55ab98290f11 0x55ab9826cd7b 0x55ab9826f03e 0x55ab98267699 0x7fa1cc657c05 0x55ab9826834c
qemu-img: Check failed: Cannot allocate memory
- Volume is 44gb;
55a11516-23e2-4ebf-972e-4aa46aec09bb a45c58ef-66d7-4bbf-b754-43c48842a6db -wi-a----- 44.00g
- The RHV database shows the physical size to be 44gb, but the virtual size is only 40gb;
For image_guid = 55a11516-23e2-4ebf-972e-4aa46aec09bb , SD = a45c58ef-66d7-4bbf-b754-43c48842a6db ( GB-Test-Data-NoSync-02 ) ; virtual size = 42949672960 , actual size = 47244640256
- audit log;
2018-05-21 08:43:47.823-04 | VM xyz has been paused.
2018-05-21 08:43:47.844-04 | VM xyz has been paused due to storage I/O problem.
2018-05-21 08:44:06.555-04 | VM xyz is down with error. Exit message: Lost connection with qemu process.
2018-05-21 08:44:06.581-04 | Highly Available VM xyz failed. It will be restarted automatically.
2018-05-21 08:44:25.95-04 | Trying to restart VM xyz on Host host05
2018-05-21 08:44:35.769-04 | VM xyz is down with error. Exit message: internal error: process exited while connecting to monitor:
- vdsm logs;
2018-05-21 15:42:50,787+0300 INFO (libvirt/events) [virt.vm] (vmId='abfa1494-ac27-4394-b358-78bf92b58de0') abnormal vm stop device virtio-disk0 error eio (vm:4211)
2018-05-21 15:42:50,787+0300 INFO (libvirt/events) [virt.vm] (vmId='abfa1494-ac27-4394-b358-78bf92b58de0') CPU stopped: onIOError (vm:5093)
2018-05-21 15:42:50,788+0300 INFO (libvirt/events) [virt.vm] (vmId='abfa1494-ac27-4394-b358-78bf92b58de0') CPU stopped: onSuspend (vm:5093)
2018-05-21 15:42:51,836+0300 ERROR (periodic/36) [virt.vm] (vmId='abfa1494-ac27-4394-b358-78bf92b58de0') Improbable extension request for volume 55a11516-23e2-4ebf-972e-4aa46aec09bb on dom
2018-05-21 15:42:51,852+0300 INFO (periodic/36) [virt.vm] (vmId='abfa1494-ac27-4394-b358-78bf92b58de0') CPU stopped: pause (vm:5093)
2018-05-21 15:42:53,826+0300 ERROR (periodic/35) [virt.vm] (vmId='abfa1494-ac27-4394-b358-78bf92b58de0') Improbable extension request for volume 55a11516-23e2-4ebf-972e-4aa46aec09bb on dom
2018-05-21 15:42:53,830+0300 INFO (periodic/35) [virt.vm] (vmId='abfa1494-ac27-4394-b358-78bf92b58de0') CPU stopped: pause (vm:5093)
2018-05-21 15:42:55,834+0300 ERROR (periodic/36) [virt.vm] (vmId='abfa1494-ac27-4394-b358-78bf92b58de0') Improbable extension request for volume 55a11516-23e2-4ebf-972e-4aa46aec09bb on dom
2018-05-21 15:42:55,863+0300 INFO (periodic/36) [virt.vm] (vmId='abfa1494-ac27-4394-b358-78bf92b58de0') CPU stopped: pause (vm:5093)
2018-05-21 15:42:57,810+0300 ERROR (periodic/35) [virt.vm] (vmId='abfa1494-ac27-4394-b358-78bf92b58de0') Improbable extension request for volume 55a11516-23e2-4ebf-972e-4aa46aec09bb on dom
2018-05-21 15:42:57,812+0300 INFO (periodic/35) [virt.vm] (vmId='abfa1494-ac27-4394-b358-78bf92b58de0') CPU stopped: pause (vm:5093)
2018-05-21 15:42:59,810+0300 ERROR (periodic/35) [virt.vm] (vmId='abfa1494-ac27-4394-b358-78bf92b58de0') Improbable extension request for volume 55a11516-23e2-4ebf-972e-4aa46aec09bb on dom
2018-05-21 15:42:59,813+0300 INFO (periodic/35) [virt.vm] (vmId='abfa1494-ac27-4394-b358-78bf92b58de0') CPU stopped: pause (vm:5093)
2018-05-21 15:43:01,830+0300 ERROR (periodic/35) [virt.vm] (vmId='abfa1494-ac27-4394-b358-78bf92b58de0') Improbable extension request for volume 55a11516-23e2-4ebf-972e-4aa46aec09bb on dom
2018-05-21 15:43:01,833+0300 INFO (periodic/35) [virt.vm] (vmId='abfa1494-ac27-4394-b358-78bf92b58de0') CPU stopped: pause (vm:5093)
2018-05-21 15:43:03,830+0300 ERROR (periodic/36) [virt.vm] (vmId='abfa1494-ac27-4394-b358-78bf92b58de0') Improbable extension request for volume 55a11516-23e2-4ebf-972e-4aa46aec09bb on dom
2018-05-21 15:43:03,833+0300 INFO (periodic/36) [virt.vm] (vmId='abfa1494-ac27-4394-b358-78bf92b58de0') CPU stopped: pause (vm:5093)
2018-05-21 15:43:05,671+0300 INFO (libvirt/events) [virt.vm] (vmId='abfa1494-ac27-4394-b358-78bf92b58de0') underlying process disconnected (vm:694)
2018-05-21 15:43:05,672+0300 INFO (libvirt/events) [virt.vm] (vmId='abfa1494-ac27-4394-b358-78bf92b58de0') Release VM resources (vm:4298)
2018-05-21 15:43:05,675+0300 INFO (libvirt/events) [virt.vm] (vmId='abfa1494-ac27-4394-b358-78bf92b58de0') Stopping connection (guestagent:430)
2018-05-21 15:43:07,500+0300 INFO (libvirt/events) [virt.vm] (vmId='abfa1494-ac27-4394-b358-78bf92b58de0') Stopping connection (guestagent:430)
- vdsm log error;
2018-05-21 15:42:51,836+0300 ERROR (periodic/36) [virt.vm] (vmId='abfa1494-ac27-4394-b358-78bf92b58de0') Improbable extension request for volume 55a11516-23e2-4ebf-972e-4aa46aec09bb on domain a45c58ef-66d7-4bbf-b754-43c48842a6db, pausing the VM to avoid corruptions (capacity: 0, allocated: 18267308032, physical: 47244640256, next physical size: 0) (vm:880)
- qemu log;
2018-05-21T12:43:35.295560Z qemu-kvm: -drive file=/rhev/data-center/083df2f6-2a0d-4eca-a55c-a3d329374a42/a45c58ef-66d7-4bbf-b754-43c48842a6db/images/170b40d2-03eb-4506-83b2-707ac56e3809/55a11516-23e2-4ebf-972e-4aa46aec09bb,format=qcow2,if=none,id=drive-virtio-disk0,serial=170b40d2-03eb-4506-83b2-707ac56e3809,cache=none,werror=stop,rerror=stop,aio=native: qcow2: Image is corrupt; cannot be opened read/write
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
