RHEV Live Storage Migration does not complete, but no errors are reported.

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Virtualization (RHEV) 3.4.3
  • Red Hat Enterprise Linux (RHEL) 6.5 host;
    • vdsm-4.13.2-0.13
    • vdsm-4.14.11-5

Issue

A Live Storage Migration completes, but leaves an unfinished task/job;

  • Live Storage Migration has essentially completed, but the associated job table entry in the RHEV database is not marked as "FINISHED".

  • All of the associated disk images have been successfully moved to the destination storage domain.

  • The Tasks pane in the Admin Portal shows that the storage migration is still in progress.

Resolution

  • The problem described in the Issue section is being tracked by BZ 1160889.

  • The interim solution for this is to modify the database to set the job as "FINISHED" and then to restart the engine. There is no need to retry the storage migration as the volumes were actually successfully moved.

  • Please contact Red Hat Technical Support for assistance in resolving the issue.

  • Also, please reference this article for another similar Live Storage Migration problem.

Root Cause

All of the images associated with the disk in question were successfully moved to the destination storage domain and removed from the source storage domain. It was only the final step in the sequence performed by the engine that did not occur, where the job in the database is marked as FINISHED.

NOTE:

  • No errors were reported.
  • This condition was reproduced by running a certain number of concurrent Live Storage Migrations, in this case 20.

Diagnostic Steps

Live Storage Migration (LSM) has essentially completed, but the associated job table entry in the RHEV database is not marked as "FINISHED"

  • 20 disks were live migrated from 'LSM_GFW' to 'NFS_GFW', both NFS data domains. Only 18 completed;
$ grep 'finished moving disk' engine.log|wc -l
18
  • Unfinished job (just showing the one related to this bug, even though there were two);
                job_id                |   action_type   |                     description                     | status  |               owner_id               | visible |         start_time         | end_time |      last_update_time     
 | correlation_id | is_external | is_auto_cleared 
--------------------------------------+-----------------+-----------------------------------------------------+---------+--------------------------------------+---------+----------------------------+----------+---------------------------
-+----------------+-------------+-----------------
         | t
 0b2b0e8e-4d5e-4bb5-8ba2-61540180ecf9 | LiveMigrateDisk | Migrating Disk lsm-vm_Disk1 from LSM_GFW to NFS_GFW | STARTED | fdfc627c-d875-11e0-90f0-83df133b58cc | t       | 2014-11-04 17:34:20.886-05 |          | 2014-11-04 17:37:20.803-05
 | 62aa3df3       | f           | t
  • correlation_id '62aa3df3' in engine.log for VM 'lsm-pool-10' (completed but not "finished");
$ grep 62aa3df3 engine.log
2014-11-04 17:34:20,904 INFO  [org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand] (org.ovirt.thread.pool-4-thread-30) [62aa3df3] Lock Acquired to object EngineLock [exclusiveLocks= , sharedLocks= key: 1f6b02a8-2171-4809-807f-ef692904a093 value: VM
2014-11-04 17:34:20,924 INFO  [org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand] (org.ovirt.thread.pool-4-thread-30) [62aa3df3] Running command: LiveMigrateDiskCommand Task handler: CreateImagePlaceholderTaskHandler internal: true. Entities affected :  ID: fe5348fb-3050-40e0-a0a5-eafaa9fcc63b Type: Disk,  ID: 23cb53e2-9aa4-4a85-ac55-f46c5443c486 Type: Storage
2014-11-04 17:34:20,929 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.CloneImageGroupStructureVDSCommand] (org.ovirt.thread.pool-4-thread-30) [62aa3df3] START, CloneImageGroupStructureVDSCommand( storagePoolId = 6d1b58e5-cc59-41e7-a65c-d2ef3def3ef8, ignoreFailoverLimit = false, storageDomainId = 9482b44a-c8f9-4b28-8700-d27424e9d4ac, imageGroupId = fe5348fb-3050-40e0-a0a5-eafaa9fcc63b, dstDomainId = 23cb53e2-9aa4-4a85-ac55-f46c5443c486), log id: f6f32b8
2014-11-04 17:34:36,195 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.CloneImageGroupStructureVDSCommand] (org.ovirt.thread.pool-4-thread-30) [62aa3df3] FINISH, CloneImageGroupStructureVDSCommand, log id: f6f32b8
2014-11-04 17:34:36,214 INFO  [org.ovirt.engine.core.bll.CommandAsyncTask] (org.ovirt.thread.pool-4-thread-30) [62aa3df3] CommandAsyncTask::Adding CommandMultiAsyncTasks object for command d5fbe880-5684-4023-a9b5-d4175dafb561
2014-11-04 17:34:36,214 INFO  [org.ovirt.engine.core.bll.CommandMultiAsyncTasks] (org.ovirt.thread.pool-4-thread-30) [62aa3df3] CommandMultiAsyncTasks::AttachTask: Attaching task 80fd3977-ce73-4e33-813e-eaec052d8f9e to command d5fbe880-5684-4023-a9b5-d4175dafb561.
2014-11-04 17:34:45,827 INFO  [org.ovirt.engine.core.bll.AsyncTaskManager] (org.ovirt.thread.pool-4-thread-30) [62aa3df3] Adding task 80fd3977-ce73-4e33-813e-eaec052d8f9e (Parent Command LiveMigrateDisk, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters), polling hasn't started yet..
2014-11-04 17:34:45,877 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-4-thread-30) [62aa3df3] Correlation ID: 62aa3df3, Job ID: 0b2b0e8e-4d5e-4bb5-8ba2-61540180ecf9, Call Stack: null, Custom Event ID: -1, Message: User admin moving disk lsm-vm_Disk1 to domain NFS_GFW.
2014-11-04 17:34:45,877 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (org.ovirt.thread.pool-4-thread-30) [62aa3df3] BaseAsyncTask::startPollingTask: Starting to poll task 80fd3977-ce73-4e33-813e-eaec052d8f9e.
2014-11-04 17:34:45,878 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (org.ovirt.thread.pool-4-thread-30) [62aa3df3] BaseAsyncTask::startPollingTask: Starting to poll task 80fd3977-ce73-4e33-813e-eaec052d8f9e.
2014-11-04 17:34:46,085 INFO  [org.ovirt.engine.core.bll.lsm.LiveMigrateVmDisksCommand] (org.ovirt.thread.pool-4-thread-30) [62aa3df3] Lock freed to object EngineLock [exclusiveLocks= key: fe5348fb-3050-40e0-a0a5-eafaa9fcc63b value: DISK
2014-11-04 17:35:00,976 INFO  [org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand] (org.ovirt.thread.pool-4-thread-9) [62aa3df3] Ending command successfully: org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand
2014-11-04 17:35:00,976 INFO  [org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand] (org.ovirt.thread.pool-4-thread-9) [62aa3df3] Running command: LiveMigrateDiskCommand Task handler: VmReplicateDiskStartTaskHandler internal: false. Entities affected :  ID: fe5348fb-3050-40e0-a0a5-eafaa9fcc63b Type: Disk,  ID: 23cb53e2-9aa4-4a85-ac55-f46c5443c486 Type: Storage
2014-11-04 17:35:00,977 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskStartVDSCommand] (org.ovirt.thread.pool-4-thread-9) [62aa3df3] START, VmReplicateDiskStartVDSCommand(HostName = rhevh-1, HostId = a2c85a15-9b53-493d-9731-8b5cccdd8951, vmId=1f6b02a8-2171-4809-807f-ef692904a093), log id: 513cd7aa
2014-11-04 17:35:11,628 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskStartVDSCommand] (org.ovirt.thread.pool-4-thread-9) [62aa3df3] FINISH, VmReplicateDiskStartVDSCommand, log id: 513cd7aa
2014-11-04 17:35:11,650 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SyncImageGroupDataVDSCommand] (org.ovirt.thread.pool-4-thread-9) [62aa3df3] START, SyncImageGroupDataVDSCommand( storagePoolId = 6d1b58e5-cc59-41e7-a65c-d2ef3def3ef8, ignoreFailoverLimit = false, storageDomainId = 9482b44a-c8f9-4b28-8700-d27424e9d4ac, imageGroupId = fe5348fb-3050-40e0-a0a5-eafaa9fcc63b, dstDomainId = 23cb53e2-9aa4-4a85-ac55-f46c5443c486, syncType=INTERNAL), log id: 3300d6cd
2014-11-04 17:35:24,536 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SyncImageGroupDataVDSCommand] (org.ovirt.thread.pool-4-thread-9) [62aa3df3] FINISH, SyncImageGroupDataVDSCommand, log id: 3300d6cd
2014-11-04 17:35:24,547 INFO  [org.ovirt.engine.core.bll.CommandMultiAsyncTasks] (org.ovirt.thread.pool-4-thread-9) [62aa3df3] CommandMultiAsyncTasks::AttachTask: Attaching task c6cc3a9f-37b1-4fac-ab7f-f141298f186d to command d5fbe880-5684-4023-a9b5-d4175dafb561.
2014-11-04 17:35:25,773 INFO  [org.ovirt.engine.core.bll.AsyncTaskManager] (org.ovirt.thread.pool-4-thread-9) [62aa3df3] Adding task c6cc3a9f-37b1-4fac-ab7f-f141298f186d (Parent Command LiveMigrateDisk, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters), polling hasn't started yet..
2014-11-04 17:35:25,795 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (org.ovirt.thread.pool-4-thread-9) [62aa3df3] BaseAsyncTask::startPollingTask: Starting to poll task c6cc3a9f-37b1-4fac-ab7f-f141298f186d.
2014-11-04 17:35:56,946 INFO  [org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand] (org.ovirt.thread.pool-4-thread-49) [62aa3df3] Ending command successfully: org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand
2014-11-04 17:35:56,946 INFO  [org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand] (org.ovirt.thread.pool-4-thread-49) [62aa3df3] Running command: LiveMigrateDiskCommand Task handler: VmReplicateDiskFinishTaskHandler internal: false. Entities affected :  ID: fe5348fb-3050-40e0-a0a5-eafaa9fcc63b Type: Disk,  ID: 23cb53e2-9aa4-4a85-ac55-f46c5443c486 Type: Storage
2014-11-04 17:35:57,105 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (org.ovirt.thread.pool-4-thread-49) [62aa3df3] START, VmReplicateDiskFinishVDSCommand(HostName = rhevh-1, HostId = a2c85a15-9b53-493d-9731-8b5cccdd8951, vmId=1f6b02a8-2171-4809-807f-ef692904a093), log id: 20b31a66
2014-11-04 17:35:58,969 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (org.ovirt.thread.pool-4-thread-49) [62aa3df3] FINISH, VmReplicateDiskFinishVDSCommand, log id: 20b31a66
2014-11-04 17:35:59,011 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand] (org.ovirt.thread.pool-4-thread-49) [62aa3df3] START, GetImageInfoVDSCommand( storagePoolId = 6d1b58e5-cc59-41e7-a65c-d2ef3def3ef8, ignoreFailoverLimit = false, storageDomainId = 23cb53e2-9aa4-4a85-ac55-f46c5443c486, imageGroupId = fe5348fb-3050-40e0-a0a5-eafaa9fcc63b, imageId = a83ae26c-c686-46a1-a2e9-1290f9e99c6e), log id: 3d06ad49
2014-11-04 17:36:29,316 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand] (org.ovirt.thread.pool-4-thread-49) [62aa3df3] FINISH, GetImageInfoVDSCommand, return: org.ovirt.engine.core.common.businessentities.DiskImage@5d80dbdd, log id: 3d06ad49
2014-11-04 17:36:29,333 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand] (org.ovirt.thread.pool-4-thread-49) [62aa3df3] START, GetImageInfoVDSCommand( storagePoolId = 6d1b58e5-cc59-41e7-a65c-d2ef3def3ef8, ignoreFailoverLimit = false, storageDomainId = 23cb53e2-9aa4-4a85-ac55-f46c5443c486, imageGroupId = fe5348fb-3050-40e0-a0a5-eafaa9fcc63b, imageId = a24db815-6f8d-4eef-909c-3618eae0358f), log id: ba23449
2014-11-04 17:36:47,518 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand] (org.ovirt.thread.pool-4-thread-49) [62aa3df3] FINISH, GetImageInfoVDSCommand, return: org.ovirt.engine.core.common.businessentities.DiskImage@1bf60018, log id: ba23449
2014-11-04 17:36:47,574 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] (org.ovirt.thread.pool-4-thread-49) [62aa3df3] START, DeleteImageGroupVDSCommand( storagePoolId = 6d1b58e5-cc59-41e7-a65c-d2ef3def3ef8, ignoreFailoverLimit = false, storageDomainId = 9482b44a-c8f9-4b28-8700-d27424e9d4ac, imageGroupId = fe5348fb-3050-40e0-a0a5-eafaa9fcc63b, postZeros = false, forceDelete = false), log id: 66b12fee
2014-11-04 17:36:55,650 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] (org.ovirt.thread.pool-4-thread-49) [62aa3df3] FINISH, DeleteImageGroupVDSCommand, log id: 66b12fee
2014-11-04 17:36:55,666 INFO  [org.ovirt.engine.core.bll.CommandMultiAsyncTasks] (org.ovirt.thread.pool-4-thread-49) [62aa3df3] CommandMultiAsyncTasks::AttachTask: Attaching task 7ca2519b-268d-43e7-9230-82caf2c0e910 to command d5fbe880-5684-4023-a9b5-d4175dafb561.
2014-11-04 17:37:07,305 INFO  [org.ovirt.engine.core.bll.AsyncTaskManager] (org.ovirt.thread.pool-4-thread-49) [62aa3df3] Adding task 7ca2519b-268d-43e7-9230-82caf2c0e910 (Parent Command LiveMigrateDisk, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters), polling hasn't started yet..
2014-11-04 17:37:07,310 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (org.ovirt.thread.pool-4-thread-49) [62aa3df3] BaseAsyncTask::startPollingTask: Starting to poll task 7ca2519b-268d-43e7-9230-82caf2c0e910.
  • In the vdsm logs, the deleteImage completed;
$ xzgrep deleteImage vdsm.log*xz | grep fe5348fb-3050-40e0-a0a5-eafaa9fcc63b
vdsm.log.7.xz:Thread-49349::INFO::2014-11-04 17:37:00,232::logUtils::44::dispatcher::(wrapper) Run and protect: deleteImage(sdUUID='9482b44a-c8f9-4b28-8700-d27424e9d4ac', spUUID='6d1b58e5-cc59-41e7-a65c-d2ef3def3ef8', imgUUID='fe5348fb-3050-40e0-a0a5-eafaa9fcc63b', postZero='false', force='false')
vdsm.log.7.xz:Thread-49349::DEBUG::2014-11-04 17:37:00,232::resourceManager::198::ResourceManager.Request::(__init__) ResName=`Storage.fe5348fb-3050-40e0-a0a5-eafaa9fcc63b`ReqID=`2eed0194-9db3-4882-a828-61b4332a7947`::Request was made in '/usr/share/vdsm/storage/hsm.py' line '1506' at 'deleteImage'
vdsm.log.7.xz:Thread-49349::DEBUG::2014-11-04 17:37:01,711::taskManager::68::TaskManager::(scheduleJob) scheduled job deleteImage_fe5348fb-3050-40e0-a0a5-eafaa9fcc63b for task 7ca2519b-268d-43e7-9230-82caf2c0e910 
vdsm.log.7.xz:7ca2519b-268d-43e7-9230-82caf2c0e910::DEBUG::2014-11-04 17:37:03,468::task::905::TaskManager.Task::(_runJobs) Task=`7ca2519b-268d-43e7-9230-82caf2c0e910`::Task.run: running job 0: deleteImage_fe5348fb-3050-40e0-a0a5-eafaa9fcc63b: <function <lambda> at 0x7f5721f35140> (args: () kwargs: {})
vdsm.log.7.xz:7ca2519b-268d-43e7-9230-82caf2c0e910::DEBUG::2014-11-04 17:37:03,469::task::331::TaskManager.Task::(run) Task=`7ca2519b-268d-43e7-9230-82caf2c0e910`::Job.run: running deleteImage_fe5348fb-3050-40e0-a0a5-eafaa9fcc63b: <function <lambda> at 0x7f5721f35140> (args: () kwargs: {}) callback None
  • task 7ca2519b-268d-43e7-9230-82caf2c0e910 completed (first 200 chars of each line only);
Thread-49375::DEBUG::2014-11-04 17:37:14,161::taskManager::96::TaskManager::(getTaskStatus) Return. Response: {'code': 0, 'message': '1 jobs completed successfully', 'taskState': 'finished', 'taskResu
Thread-49375::DEBUG::2014-11-04 17:37:14,179::taskManager::113::TaskManager::(getAllTasksStatuses) Return: {'77aea9f7-4a7c-4b55-810d-e8859a7e611e': {'code': 0, 'message': '1 jobs completed successfull
Thread-49375::INFO::2014-11-04 17:37:14,179::logUtils::47::dispatcher::(wrapper) Run and protect: getAllTasksStatuses, Return response: {'allTasksStatus': {'77aea9f7-4a7c-4b55-810d-e8859a7e611e': {'co
Thread-49375::DEBUG::2014-11-04 17:37:14,180::task::1185::TaskManager.Task::(prepare) Task=`c3afdfcd-8129-4ae1-9988-9cd7d00c89e7`::finished: {'allTasksStatus': {'77aea9f7-4a7c-4b55-810d-e8859a7e611e':
Thread-49343::DEBUG::2014-11-04 17:37:27,965::taskManager::96::TaskManager::(getTaskStatus) Return. Response: {'code': 0, 'message': '1 jobs completed successfully', 'taskState': 'finished', 'taskResu
Thread-49343::DEBUG::2014-11-04 17:37:27,974::taskManager::113::TaskManager::(getAllTasksStatuses) Return: {'77aea9f7-4a7c-4b55-810d-e8859a7e611e': {'code': 0, 'message': '1 jobs completed successfull
Thread-49343::INFO::2014-11-04 17:37:27,974::logUtils::47::dispatcher::(wrapper) Run and protect: getAllTasksStatuses, Return response: {'allTasksStatus': {'77aea9f7-4a7c-4b55-810d-e8859a7e611e': {'co
Thread-49343::DEBUG::2014-11-04 17:37:27,974::task::1185::TaskManager.Task::(prepare) Task=`77e6b48b-3553-46bc-904b-f36f764f0789`::finished: {'allTasksStatus': {'77aea9f7-4a7c-4b55-810d-e8859a7e611e' 
  • images in d/b for VM 'lsm-pool-10';
              image_guid              |            image_group_id            |            vm_snapshot_id            |               parentid               | imagestatus |     creation_date      | volume_type | volume_format | active 
--------------------------------------+--------------------------------------+--------------------------------------+--------------------------------------+-------------+------------------------+-------------+---------------+--------
 a83ae26c-c686-46a1-a2e9-1290f9e99c6e | fe5348fb-3050-40e0-a0a5-eafaa9fcc63b | 7fcac7f6-9074-4b28-ac17-d5789d912582 | 2e4fbb3b-22cf-438a-ab41-89667ffafe69 |           1 | 2014-11-04 17:16:40-05 |           2 |             4 | f
 a24db815-6f8d-4eef-909c-3618eae0358f | fe5348fb-3050-40e0-a0a5-eafaa9fcc63b | 331ea26d-4a24-47c2-aa99-1d1cde5180c8 | a83ae26c-c686-46a1-a2e9-1290f9e99c6e |           1 | 2014-11-04 17:33:11-05 |           2 |             4 | t
(2 rows)

For image_guid = a83ae26c-c686-46a1-a2e9-1290f9e99c6e , SD = 23cb53e2-9aa4-4a85-ac55-f46c5443c486 ; image size = 200704
For image_guid = a24db815-6f8d-4eef-909c-3618eae0358f , SD = 23cb53e2-9aa4-4a85-ac55-f46c5443c486 ; image size = 200704
  • storage domains;
 9482b44a-c8f9-4b28-8700-d27424e9d4ac  -  LSM_GFW  -  10.18.33.48:/home/exports/lsm_nfs
 23cb53e2-9aa4-4a85-ac55-f46c5443c486  -  NFS_GFW  -  10.18.33.48:/home/exports/rhev_export
  • image_storage_domain_map;
               image_id               |          storage_domain_id           
--------------------------------------+--------------------------------------
 a83ae26c-c686-46a1-a2e9-1290f9e99c6e | 23cb53e2-9aa4-4a85-ac55-f46c5443c486 
 a24db815-6f8d-4eef-909c-3618eae0358f | 23cb53e2-9aa4-4a85-ac55-f46c5443c486 
  • The template image is in both SDs;
               image_id               |          storage_domain_id           
--------------------------------------+--------------------------------------
 2e4fbb3b-22cf-438a-ab41-89667ffafe69 | 23cb53e2-9aa4-4a85-ac55-f46c5443c486 
 2e4fbb3b-22cf-438a-ab41-89667ffafe69 | 9482b44a-c8f9-4b28-8700-d27424e9d4ac
  • /rhev/data-center on the SPM;
There is only only disk and it's in 'NFS_GFW'.
/rhev/data-center/mnt/10.18.33.48:_home_exports_rhev__export/23cb53e2-9aa4-4a85-ac55-f46c5443c486/images/fe5348fb-3050-40e0-a0a5-eafaa9fcc63b
  • What's missing;

    The ending sequence in the engine did not occur, the following messages were not reported;

  Ending command successfully: org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand
  Message: User admin finished moving disk lsm-vm_Disk1 to domain NFS_GFW.

NOTE: Some of the output above is from SQL commands. Please contact Red Hat Technical Support for details about these commands.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments