Snapshot task doesn't complete and remains in locked state.

Solution In Progress - Updated -

Environment

  • Red Hat Virtualization (RHV-M) version 4.4.3
  • Host version RHEL - 8.4 and vdsm-4.40.60.7-1.el8ev

Issue

  • Snapshot create was initiated for a VM and in the events tab of the RHV-M UI/Webadmin shows task stuck on Creating VM snapshot.

  • The RHV-M environment is at cluster level 4.3 with hosts on version RHV-H 4.4 or RHEL 8.4 version.

Resolution

  • Please upgrade to RHV-M 4.4.6 version to avoid this problem.

  • Please do not shutdown the VM that has the snapshot task stuck. Since the snapshot image volume was created for the VM and is not in use, please open a case with Red Hat Support to correct the VM state.
    The image will need to be removed from RHV-M engine database and from the storage domain.

Root Cause

  • This error is generate from the /usr/lib/python3.6/site-packages/vdsm/API.py:
        if timeout <= 0:
            # It makes no sense to run a snapshot with timeout <= 0.
            # It can happen due to an Engine bug running a synchronous
            # instead of an asynchronous snapshot (bz#1950209).
            self.log.error("Zero snapshot timeout requested")
            raise exception.InvalidParameter(action='VM.snapshot',
                                             parameter='timeout', value=0)
  • Starting from 4.4, the snapshot operation has changed to use asynchronous methods.

  • The problem with snapshot operation is due to having 4.3 Cluster level with 4.4.6 hosts.

  • In Cluster level 4.3, the old synchronous snapshot flow is used but since the hosts are higher versions, it no longer has the code for the synchronous snapshot flow.

Diagnostic Steps

  • According to the RHV-M database, the VM has one disk with a snapshot image.
    The snapshot and a disk images are in LOCKED state.

image_guid | image_group_id | vm_snapshot_id | parentid | imagestatus | volume_type | volume_format | active --------------------------------------+--------------------------------------+--------------------------------------+--------------------------------------+-------------+-------------+---------------+-------- 87c87a5a-b547-4f73-bec6-039621a07300 | ecd5fd4c-d84f-4b84-be28-c03bfb7c84a6 | 40926526-c5d6-419c-b7de-ae3dfbdda04b | 00000000-0000-0000-0000-000000000000 | 1 | 1 | 5 | f b315cf48-51dc-4115-86b4-854c0d4b2411 | ecd5fd4c-d84f-4b84-be28-c03bfb7c84a6 | 695569e2-59bc-4d8a-a121-6a4edc7872be | 87c87a5a-b547-4f73-bec6-039621a07300 | 2 | 2 | 4 | t (2 rows) For image_guid = 87c87a5a-b547-4f73-bec6-039621a07300 , SD = 1f7638f0-c62a-4d2d-9de3-25813b02a2c9 ( MY_SD / V5 ) ; virtual size = 214748364800 , actual size = 214748364800 For image_guid = b315cf48-51dc-4115-86b4-854c0d4b2411 , SD = 1f7638f0-c62a-4d2d-9de3-25813b02a2c9 ( MY_SD / V5 ) ; virtual size = 214748364800 , actual size = 214748364800 snapshots: snapshot_id | vm_id | snapshot_type | status | description | creation_date | memory_metadata_disk_id | memory_dump_disk_id --------------------------------------+--------------------------------------+---------------+--------+-------------+----------------------------+-------------------------+--------------------- 695569e2-59bc-4d8a-a121-6a4edc7872be | 16a759db-7247-43d3-8ff0-291150948ec6 | ACTIVE | OK | Active VM | 2021-04-01 18:26:23.098+00 | | 40926526-c5d6-419c-b7de-ae3dfbdda04b | 16a759db-7247-43d3-8ff0-291150948ec6 | REGULAR | LOCKED | my_snapshot | 2021-07-28 12:26:37.4+00 | | (2 rows)
  • Both disk images exists in the storage domain, however the b315cf48-51dc-4115-86b4-854c0d4b2411 image is not open or active by the VM.
  87c87a5a-b547-4f73-bec6-039621a07300 1f7638f0-c62a-4d2d-9de3-25813b02a2c9    1 -wi-ao----  200.00g  IU_ecd5fd4c-d84f-4b84-be28-c03bfb7c84a6,MD_39,PU_00000000-0000-0000-0000-000000000000 
  b315cf48-51dc-4115-86b4-854c0d4b2411 1f7638f0-c62a-4d2d-9de3-25813b02a2c9    1 -wi-------    1.00g  IU_ecd5fd4c-d84f-4b84-be28-c03bfb7c84a6,MD_66,PU_87c87a5a-b547-4f73-bec6-039621a07300
  • The metadata information show disk images in correct state.
   image:    ecd5fd4c-d84f-4b84-be28-c03bfb7c84a6

             - 87c87a5a-b547-4f73-bec6-039621a07300
               status: OK, voltype: INTERNAL, format: RAW, legality: LEGAL, type: PREALLOCATED, capacity: 214748364800, truesize: 214748364800

             - b315cf48-51dc-4115-86b4-854c0d4b2411
               status: OK, voltype: LEAF, format: COW, legality: LEGAL, type: SPARSE, capacity: 214748364800, truesize: 1073741824 
  • The VM XML shows only the base image in use by the VM.
      <source dev='/rhev/data-center/mnt/blockSD/1f7638f0-c62a-4d2d-9de3-25813b02a2c9/images/ecd5fd4c-d84f-4b84-be28-c03bfb7c84a6/87c87a5a-b547-4f73-bec6-039621a07300' index='1'>
  • Engine logs show failure error message of timeout value of 0.
2021-07-28 15:27:02,181+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-98) [8ba4e626-c9d4-4677-8181-d7b08d835bff] Command 'SnapshotVDSCommand(HostName = myhost.example.com, SnapshotVDSCommandParameters:{hostId='bc6ef865-6da6-4135-aaa9-c2d03dd99ff2', vmId='16a759db-7247-43d3-8ff0-291150948ec6'})' execution failed: VDSGenericException: VDSErrorException: Failed to SnapshotVDS, error = Invalid parameter: {'action': 'VM.snapshot', 'parameter': 'timeout', 'value': 0}, code = 91

2021-07-28 15:27:02,181+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-98) [8ba4e626-c9d4-4677-8181-d7b08d835bff] FINISH, SnapshotVDSCommand, return: , log id: 78411612

2021-07-28 15:27:02,186+03 WARN  [org.ovirt.engine.core.bll.snapshots.CreateLiveSnapshotForVmCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-98) [8ba4e626-c9d4-4677-8181-d7b08d835bff] Could not perform live snapshot due to error, VM will still be configured to the new created snapshot: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to SnapshotVDS, error = Invalid parameter: {'action': 'VM.snapshot', 'parameter': 'timeout', 'value': 0}, code = 91 (Failed with error unexpected and code 16)
  • Host vdsm logs shows the same failure.
2021-07-28 15:25:43,342+0300 INFO  (jsonrpc/3) [vdsm.api] START createVolume(sdUUID='1f7638f0-c62a-4d2d-9de3-25813b02a2c9', spUUID='5abb7ad8-0132-01a9-0137-000000000328', imgUUID='ecd5fd4c-d84f-4b84-be28-c03bfb7c84a6', size='214748364800', volFormat=4, preallocate=2, diskType='DATA', volUUID='b315cf48-51dc-4115-86b4-854c0d4b2411', desc='', srcImgUUID='ecd5fd4c-d84f-4b84-be28-c03bfb7c84a6', srcVolUUID='87c87a5a-b547-4f73-bec6-039621a07300', initialSize=None, addBitmaps=False) from=::ffff:XX.X.XXX.XX,50328, flow_id=8ba4e626-c9d4-4677-8181-d7b08d835bff, task_id=17e94ce4-ba00-4fe9-9b19-c1e67d3622a3 (api:48)
2021-07-28 15:25:43,409+0300 INFO  (jsonrpc/3) [vdsm.api] FINISH createVolume return=None from=::ffff:XX.X.XXX.XX,50328, flow_id=8ba4e626-c9d4-4677-8181-d7b08d835bff, task_id=17e94ce4-ba00-4fe9-9b19-c1e67d3622a3 (api:54)


2021-07-28 15:25:44,222+0300 INFO  (tasks/0) [storage.Volume] Creating volume b315cf48-51dc-4115-86b4-854c0d4b2411 (volume:1227)
2021-07-28 15:25:44,297+0300 INFO  (tasks/0) [storage.LVM] Creating LV (vg=1f7638f0-c62a-4d2d-9de3-25813b02a2c9, lv=b315cf48-51dc-4115-86b4-854c0d4b2411, size=1024m, activate=True, contiguous=False, initialTags=('OVIRT_VOL_INITIALIZING',), device=None) (lvm:1566)


2021-07-28 15:25:44,733+0300 INFO  (tasks/0) [storage.Volume] Creating volume symlink from '/dev/1f7638f0-c62a-4d2d-9de3-25813b02a2c9/b315cf48-51dc-4115-86b4-854c0d4b2411' to '/rhev/data-center/mnt/blockSD/1f7638f0-c62a-4d2d-9de3-25813b02a2c9/images/ecd5fd4c-d84f-4b84-be28-c03bfb7c84a6/b315cf48-51dc-4115-86b4-854c0d4b2411' (blockVolume:512)
2021-07-28 15:25:44,733+0300 INFO  (tasks/0) [storage.Volume] Request to create snapshot ecd5fd4c-d84f-4b84-be28-c03bfb7c84a6/b315cf48-51dc-4115-86b4-854c0d4b2411 of volume ecd5fd4c-d84f-4b84-be28-c03bfb7c84a6/87c87a5a-b547-4f73-bec6-039621a07300 with capacity 214748364800 (blockVolume:528)

2021-07-28 15:26:00,130+0300 INFO  (jsonrpc/2) [api.virt] START snapshot(snapDrives=[{'imageID': 'ecd5fd4c-d84f-4b84-be28-c03bfb7c84a6', 'baseVolumeID': '87c87a5a-b547-4f73-bec6-039621a07300', 'volumeID': 'b315cf48-51dc-4115-86b4-854c0d4b2411', 'domainID': '1f7638f0-c62a-4d2d-9de3-25813b02a2c9'}], snapMemory=None, frozen=False, jobUUID='b5ec02ef-defe-447f-b0dd-05f559ff5644', timeout=0) from=::ffff:XX.X.XXX.XX,44102, flow_id=8ba4e626-c9d4-4677-8181-d7b08d835bff, vmId=16a759db-7247-43d3-8ff0-291150948ec6 (api:48)
2021-07-28 15:26:00,131+0300 ERROR (jsonrpc/2) [vds] Zero snapshot timeout requested (API:719)
2021-07-28 15:26:00,131+0300 ERROR (jsonrpc/2) [api] FINISH snapshot error=Invalid parameter: {'action': 'VM.snapshot', 'parameter': 'timeout', 'value': 0} (api:131)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 124, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/API.py", line 721, in snapshot
    parameter='timeout', value=0)
vdsm.common.exception.InvalidParameter: Invalid parameter: {'action': 'VM.snapshot', 'parameter': 'timeout', 'value': 0}
2021-07-28 15:26:00,133+0300 INFO  (jsonrpc/2) [api.virt] FINISH snapshot return={'status': {'code': 91, 'message': "Invalid parameter: {'action': 'VM.snapshot', 'parameter': 'timeout', 'value': 0}"}} from=::ffff:XX.X.XXX.XX,44102, flow_id=8ba4e626-c9d4-4677-8181-d7b08d835bff, vmId=16a759db-7247-43d3-8ff0-291150948ec6 (api:54)
2021-07-28 15:26:00,134+0300 INFO  (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC call VM.snapshot failed (error 91) in 0.00 seconds (__init__:312)
  • The Snapshot timeout in RHV-M engine database is set to 30mins.
# engine-config -g LiveSnapshotTimeoutInMinutes
LiveSnapshotTimeoutInMinutes: 30 version: general
  • The ovirt-engine is ovirt-engine-4.4.3.12-0.1.el8ev.noarch version but the is 4.4.6 version.
          vds_name     |                pretty_name                |       host_os        |      rpm_version       | cluster_compatibility_version 
-----------------------+-------------------------------------------+----------------------+------------------------+-------------------------------
 myhost.example.com    | Red Hat Virtualization Host 4.4.6 (el8.4) | RHEL - 8.4 - 2.el8ev | vdsm-4.40.60.7-1.el8ev | 4.3

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.