Live Merge failures and their meaning

Solution In Progress - Updated -

Environment

  • Red Hat Enterprise Virtualization (RHEV) 3.5

Issue

  • Live Merge can fail and report some very specific errors that need explanation. This article will describe the following specific ones at this time.

  • VolumeResizeValueError;

9ebc2dca-99d1-4158-8e45-6e70bf6895ea::ERROR::2016-01-27 01:20:42,077::volume::575::Storage.Volume::(extendSize) current size of volume b1b7fe39-c212-4483-9b64-479b1e69979a is larger than the size requested in the extension (2311323648 > 2097152000)
  File "/usr/share/vdsm/storage/volume.py", line 576, in extendSize

9ebc2dca-99d1-4158-8e45-6e70bf6895ea::ERROR::2016-01-27 01:20:42,079::task::866::Storage.TaskManager.Task::(_setError) Task=`9ebc2dca-99d1-4158-8e45-6e70bf6895ea`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
  File "/usr/share/vdsm/storage/task.py", line 334, in run
  File "/usr/share/vdsm/storage/securable.py", line 77, in wrapper
  File "/usr/share/vdsm/storage/sp.py", line 1325, in extendVolumeSize
  File "/usr/share/vdsm/storage/volume.py", line 576, in extendSize
VolumeResizeValueError: Incorrect size value for volume resize: (2097152000,)
  • VolumeImageHasChildren;
4a07a9c2-05f2-42a2-a207-84945946d527::ERROR::2016-01-28 01:13:12,958::task::866::Storage.TaskManager.Task::(_setError) Task=`4a07a9c2-05f2-42a2-a207-84945946d527`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
  File "/usr/share/vdsm/storage/task.py", line 334, in run
  File "/usr/share/vdsm/storage/securable.py", line 77, in wrapper
  File "/usr/share/vdsm/storage/sp.py", line 1864, in deleteVolume
  File "/usr/share/vdsm/storage/blockVolume.py", line 209, in delete
  File "/usr/share/vdsm/storage/volume.py", line 505, in validateDelete
VolumeImageHasChildren: Cannot delete volume which has children (non-ethical): ['sdUUID: af3df02d-1e22-497c-a2f5-f9ae0a87e730', u'imgUUID: 58a37aef-9db5-417b-b08b-e2d77aa4b6a5', u'volUUID: 123744f4-4407-494f-a155-3830cdbdef1e']
  • ImageIsNotLegalChain;
Thread-6376::ERROR::2016-01-25 20:33:10,563::image::201::Storage.Image::(getChain) There is no leaf in the image c25d6e0c-4f31-4422-b025-851747cf0c7f
Thread-6376::WARNING::2016-01-25 20:33:10,564::resourceManager::591::Storage.ResourceManager::(registerResource) Resource factory failed to create resource '50f775da-5747-4eb8-9778-c89ec3970999_imageNS.c25d6e0c-4f31-4422-b025-851747cf0c7
f'. Canceling request.
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/resourceManager.py", line 587, in registerResource
  File "/usr/share/vdsm/storage/resourceFactories.py", line 193, in createResource
  File "/usr/share/vdsm/storage/resourceFactories.py", line 122, in __getResourceCandidatesList
  File "/usr/share/vdsm/storage/image.py", line 202, in getChain
ImageIsNotLegalChain: Image is not a legal chain: (u'c25d6e0c-4f31-4422-b025-851747cf0c7f',)

Resolution

Please refer to the following article for steps to take to capture pertinent information and then contact Red Hat Support Technical Support for assistance.

Root Cause

  • The afore-mentioned errors can occur at different stages of the Live Merge sequence and can leave the volumes of each disk of a VM in various inconsistent states. The inconsistencies need to be reconciled before any subsequent actions on the VM can be performed.

  • The areas that can become inconsistent are;

    • RHEV database
    • LV tags (for block-based storage domains)
    • Volume metadata
    • qemu image chains
  • Please refer to the Diagnostic Steps section for details.

Diagnostic Steps

1) VolumeResizeValueError;

In this case the Live Merge essentially didn't even get started. The first thing it did was to erroneously try to extend the base image, which failed. This was caused by the volume sizes being inconsistent between the RHEV database and the qemu image.

Specifically, the disk images showed a base image of size 536870912000, yet the actual_size in the disk_image_dynamic table entry showed 1183397707776;

# /usr/share/ovirt-engine/dbscripts/engine-psql.sh -c "select image_guid,image_group_id,parentid,imagestatus,volume_type,volume_format,active,size from images where image_group_id in (select device_id from vm_device where vm_id in (select vm_guid from vm_static where vm_name = 'vm-x'));"
              image_guid              |            image_group_id            |               parentid               | imagestatus | volume_type | volume_format | active |     size      
--------------------------------------+--------------------------------------+--------------------------------------+-------------+-------------+---------------+--------+---------------
 123744f4-4407-494f-a155-3830cdbdef1e | 58a37aef-9db5-417b-b08b-e2d77aa4b6a5 | b1b7fe39-c212-4483-9b64-479b1e69979a |           1 |           2 |             4 | t      | 1073741824000
 b1b7fe39-c212-4483-9b64-479b1e69979a | 58a37aef-9db5-417b-b08b-e2d77aa4b6a5 | 00000000-0000-0000-0000-000000000000 |           4 |           1 |             5 | f      |  536870912000

The qemu image sizes were;

# qemu-img info /dev/af3df02d-1e22-497c-a2f5-f9ae0a87e730/123744f4-4407-494f-a155-3830cdbdef1e
image: /dev/af3df02d-1e22-497c-a2f5-f9ae0a87e730/123744f4-4407-494f-a155-3830cdbdef1e
file format: qcow2
virtual size: 1.0T (1073741824000 bytes)

# qemu-img info /dev/af3df02d-1e22-497c-a2f5-f9ae0a87e730/b1b7fe39-c212-4483-9b64-479b1e69979a
image: /dev/af3df02d-1e22-497c-a2f5-f9ae0a87e730/b1b7fe39-c212-4483-9b64-479b1e69979a
file format: raw
virtual size: 1.1T (1183397707776 bytes)

The error reported was;

9ebc2dca-99d1-4158-8e45-6e70bf6895ea::ERROR::2016-01-27 01:20:42,077::volume::575::Storage.Volume::(extendSize) current size of volume b1b7fe39-c212-4483-9b64-479b1e69979a is larger than the size requested in the extension (2311323648 > 2097152000)
  File "/usr/share/vdsm/storage/volume.py", line 576, in extendSize

9ebc2dca-99d1-4158-8e45-6e70bf6895ea::ERROR::2016-01-27 01:20:42,079::task::866::Storage.TaskManager.Task::(_setError) Task=`9ebc2dca-99d1-4158-8e45-6e70bf6895ea`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
  File "/usr/share/vdsm/storage/task.py", line 334, in run
  File "/usr/share/vdsm/storage/securable.py", line 77, in wrapper
  File "/usr/share/vdsm/storage/sp.py", line 1325, in extendVolumeSize
  File "/usr/share/vdsm/storage/volume.py", line 576, in extendSize
VolumeResizeValueError: Incorrect size value for volume resize: (2097152000,)

The sizes shown above are calculated as follows;

2097152000 * 512 = 1073741824000 bytes

2311323648 * 512 = 1183397707776 bytes

What the code in VDSM did was to compare the qemu image size of the child image, i.e. 2311323648 (1183397707776 bytes) with the size of the base image in the RHEV database of 2097152000 (1073741824000 bytes), and since it was already larger, it reported the error and terminated the Live Merge sequence.

What had happened prior to this was that many Live Merges and Live Storage Migrations had failed, and somehow left the base image with an incorrect size in the RHEV database.

Since the resolution to this requires modifying the RHEV database, we recommend contacting Red Hat Technical Support for assistance with this.

2) VolumeImageHasChildren;

In this case Live Merge was successful in actually merging the volume data, as shown in the engine logs;

2016-01-27 20:15:55,292 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-52) [4d0cf5a9] Snapshot 6c37b3ff-dbd7-46a5-bcb1-d089d445fd6d images 123744f4-4407-494f-a155-3830cdbdef1e..b1b7fe39-c212-4483-9b64-479b1e69979a merged, but volume removal failed. Some or all of the following volumes may be orphaned: [123744f4-4407-494f-a155-3830cdbdef1e]. Please retry Live Merge on the snapshot to complete the operation.

and in the vdsm logs;

Thread-9062::INFO::2016-01-28 01:12:10,155::vm::6303::vm.Vm::(_waitForXMLUpdate) vmId=`012a92ec-fa6d-4369-ae4e-b75903d16836`::Waiting for libvirt to update the XML after pivot of drive virtio-disk0 completed

However, the subsequent attempt to remove the merged volume failed, as shown in the vdsm logs;

jsonrpc.Executor-worker-2::DEBUG::2016-01-28 01:13:12,209::taskManager::78::Storage.TaskManager::(scheduleJob) scheduled job deleteVolume for task 4a07a9c2-05f2-42a2-a207-84945946d527 
jsonrpc.Executor-worker-2::DEBUG::2016-01-28 01:13:12,231::__init__::514::jsonrpc.JsonRpcServer::(_serveRequest) Return 'Image.deleteVolumes' in bridge with 4a07a9c2-05f2-42a2-a207-84945946d527

4a07a9c2-05f2-42a2-a207-84945946d527::ERROR::2016-01-28 01:13:12,958::task::866::Storage.TaskManager.Task::(_setError) Task=`4a07a9c2-05f2-42a2-a207-84945946d527`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
  File "/usr/share/vdsm/storage/task.py", line 334, in run
  File "/usr/share/vdsm/storage/securable.py", line 77, in wrapper
  File "/usr/share/vdsm/storage/sp.py", line 1864, in deleteVolume
  File "/usr/share/vdsm/storage/blockVolume.py", line 209, in delete
  File "/usr/share/vdsm/storage/volume.py", line 505, in validateDelete
VolumeImageHasChildren: Cannot delete volume which has children (non-ethical): ['sdUUID: af3df02d-1e22-497c-a2f5-f9ae0a87e730', u'imgUUID: 58a37aef-9db5-417b-b08b-e2d77aa4b6a5', u'volUUID: 123744f4-4407-494f-a155-3830cdbdef1e']

Prior to the live merge attempt, the two active volumes of this disk were (from lvs -o +tags);

  123744f4-4407-494f-a155-3830cdbdef1e af3df02d-1e22-497c-a2f5-f9ae0a87e730   15 -wi-ao---- 287.88g  -1  -1  253   23                                                     RrI3RD-8Lwk-s76z-1Qbd-1vwr-0yYt-zSp3nh          IU_58a37aef-9db5-417b-b08b-e2d77aa4b6a5,MD_4,PU_b1b7fe39-c212-4483-9b64-479b1e69979a          

  b1b7fe39-c212-4483-9b64-479b1e69979a af3df02d-1e22-497c-a2f5-f9ae0a87e730    4 -wi-ao----   1.08t  -1  -1  253   20                                                     K4Q9lQ-SwP6-CFA1-GxRo-I8ZI-TP2m-0IPtjY          IU_58a37aef-9db5-417b-b08b-e2d77aa4b6a5,MD_15,PU_00000000-0000-0000-0000-000000000000

The Live Merge was trying to remove 123744f4-4407-494f-a155-3830cdbdef1e. However, another volume was still showing this as its parent. Its LV tags contained PU_123744f4-4407-494f-a155-3830cdbdef1e;

  d2ddf7f7-524c-4124-99e0-c078fa9bf2d6 af3df02d-1e22-497c-a2f5-f9ae0a87e730    2 -wi-a-----   4.00g  -1  -1  253   24                                                     S6IZeK-TI39-i06w-OFAb-R9Un-5odJ-P7ck39          IU_58a37aef-9db5-417b-b08b-e2d77aa4b6a5,MD_11,PU_123744f4-4407-494f-a155-3830cdbdef1e

The volume d2ddf7f7-524c-4124-99e0-c078fa9bf2d6 was residual from some previous Live Merge and Live Snapshot Migration failures.

The resolution to this would be to;

Remove the `d2ddf7f7-524c-4124-99e0-c078fa9bf2d6` volume.
Remove the `123744f4-4407-494f-a155-3830cdbdef1e` volume.
Modify the metadata of the parent (in this case, base) volume.
Modify the RHEV database.

Since this requires the RHEV database to be modified in conjunction with other steps that need to be carefully evaluated, we recommend contacting Red Hat Technical Support for assistance with this.

3) ImageIsNotLegalChain;

In this case Live Merge was successful in actually merging the volume data, but the subsequent attempt to remove the merged volume failed.

Again we see that the merge was successful in the engine logs;

2016-01-25 16:56:43,939 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-54) [1d5e0f31] Snapshot d0624913-1671-49ec-8dca-cb87ae6ce3ed images b5a29b50-39cc-4685-9857-af57d4094cb8..ef27bbf6-f111-455e-a109-1d43bfdbbc17 merged, but volume removal failed. Some or all of the following volumes may be orphaned: [b5a29b50-39cc-4685-9857-af57d4094cb8]. Please retry Live Merge on the snapshot to complete the operation.

and in the vdsm logs;

Thread-6496::INFO::2016-01-25 21:52:42,778::vm::6303::vm.Vm::(_waitForXMLUpdate) vmId=`00f9001f-9b86-4fe1-a9aa-3240fcf7a7ef`::Waiting for libvirt to update the XML after pivot of drive virtio-disk0 completed

Then the following was reported because the metadata for both volumes contained VOLTYPE=INTERNAL;

Thread-6497::WARNING::2016-01-25 21:52:57,729::resourceManager::591::Storage.ResourceManager::(registerResource) Resource factory failed to create resource '50f775da-5747-4eb8-9778-c89ec3970999_imageNS.c25d6e0c-4f31-4422-b025-851747cf0c7f'. Canceling request.
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/resourceManager.py", line 587, in registerResource
  File "/usr/share/vdsm/storage/resourceFactories.py", line 193, in createResource
  File "/usr/share/vdsm/storage/resourceFactories.py", line 122, in __getResourceCandidatesList
  File "/usr/share/vdsm/storage/image.py", line 202, in getChain
ImageIsNotLegalChain: Image is not a legal chain: (u'c25d6e0c-4f31-4422-b025-851747cf0c7f',)

The metadata of the outermost child image (leaf) in the chain should contain VOLTYPE=LEAF.

The resolution to this would be to;

Remove the leaf volume.
Modify the metadata of the parent volume.
Modify the RHEV database.

Since this requires the RHEV database to be modified in conjunction with other steps that need to be carefully evaluated, we recommend contacting Red Hat Technical Support for assistance with this.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments