Live Merge times out and fails on the engine but actually succeeds on the host

Solution In Progress - Updated -

Issue

  • A Live Merge (snapshot deletion) timed out on the engine and reported that it failed, but it continued and succeeded on the host. This left the VM's volumes in an inconsistent state and required manual intervention.

  • The engine log contained;

2016-02-09 17:36:39,783 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-6-thread-4) [433adeb7] Command MergeVDSCommand(HostName = host2, MergeVDSCommandParameters{HostId = 8064fb65-939e-4675-aa30-764f4e885e9a, vmId=482aa8aa-aaa0-45ee-ac87-09dbb1fc3b26, storagePoolId=dffa92b5-0826-48f1-98a0-c6bfe1cf169c, storageDomainId=2c565c22-1e20-4c07-8457-51726763471a, imageGroupId=31b76a05-3b45-46ac-bed0-9a78fb7d12d2, imageId=9d69d818-c1a0-4f39-adee-bb2320b33dc9, baseImageId=3658e7a1-61ef-437e-82bd-29b20d3531e6, topImageId=9d69d818-c1a0-4f39-adee-bb2320b33dc9, bandwidth=0}) execution failed. Exception: VDSNetworkException: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues

2016-02-09 17:36:59,762 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-73) [616edc76] Merging of snapshot 62306512-ed74-4f40-a67a-dca1de382a00 images 3658e7a1-61ef-437e-82bd-29b20d3531e6..9d69d818-c1a0-4f39-adee-bb2320b33dc9 failed. Images have been marked illegal and can no longer be previewed or reverted to. Please retry Live Merge on the snapshot to complete the operation.
  • The vdsm logs on the host on which the VM was running contained;
jsonrpc.Executor-worker-6::DEBUG::2016-02-09 17:33:39,295::__init__::481::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'VM.merge' in bridge with {u'topVolUUID': u'9d69d818-c1a0-4f39-adee-bb2320b33dc9', u'vmID': u'482aa8aa-aaa0-45ee-ac87-09dbb1fc3b26', u'drive': {u'domainID': u'2c565c22-1e20-4c07-8457-51726763471a', u'volumeID': u'9d69d818-c1a0-4f39-adee-bb2320b33dc9', u'poolID': u'dffa92b5-0826-48f1-98a0-c6bfe1cf169c', u'imageID': u'31b76a05-3b45-46ac-bed0-9a78fb7d12d2'}, u'bandwidth': u'0', u'jobUUID': u'7ee66159-cedd-40b3-b771-33cbde950acc', u'baseVolUUID': u'3658e7a1-61ef-437e-82bd-29b20d3531e6'}

jsonrpc.Executor-worker-6::DEBUG::2016-02-09 17:48:43,085::__init__::514::jsonrpc.JsonRpcServer::(_serveRequest) Return 'VM.merge' in bridge with True

Thread-30532::INFO::2016-02-09 17:48:58,933::vm::6235::vm.Vm::(tryPivot) vmId=`482aa8aa-aaa0-45ee-ac87-09dbb1fc3b26`::Requesting pivot to complete active layer commit (job 7ee66159-cedd-40b3-b771-33cbde950acc)
Thread-30532::INFO::2016-02-09 17:48:58,967::vm::6303::vm.Vm::(_waitForXMLUpdate) vmId=`482aa8aa-aaa0-45ee-ac87-09dbb1fc3b26`::Waiting for libvirt to update the XML after pivot of drive virtio-disk1 completed
Thread-30532::INFO::2016-02-09 17:48:58,984::vm::6248::vm.Vm::(tryPivot) vmId=`482aa8aa-aaa0-45ee-ac87-09dbb1fc3b26`::Pivot completed (job 7ee66159-cedd-40b3-b771-33cbde950acc)

Thread-30532::DEBUG::2016-02-09 17:48:59,004::vm::6109::vm.Vm::(_syncVolumeChain) vmId=`482aa8aa-aaa0-45ee-ac87-09dbb1fc3b26`::vdsm chain: ['3658e7a1-61ef-437e-82bd-29b20d3531e6', '9d69d818-c1a0-4f39-adee-bb2320b33dc9'], libvirt chain: ['3658e7a1-61ef-437e-82bd-29b20d3531e6']
  • The RHEV database still contained both images, one now marked as illegal.

  • Both images still physically existed in the storage domain.

  • The volume metadata for the merged image was marked as ILLEGAL.

  • The volume metadata for the parent image still contained VOLTYPE=INTERNAL.

  • The volumes in question though had been merged, i.e. the snapshot deletion had been successful.

Environment

  • Red Hat Enterprise Virtualization (RHEV) 3.5
  • Red Hat Enterprise Linux (RHEL) 7.2 hosts

    • vdsm-4.16.32-1
  • Live Merge = Snapshot Deletion while the VM is up and running.

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content