RHEV 3.2 super slow snapshot deletion
I am running RHEL 6.1 on top of RHEV-H (6.4) and just for testing purpose I have created and then delete snapshot.
Well this snapshot deletion took about ~30 min.
# Starting of Snapshot deletion
2013-08-01 12:36:24,886 INFO [org.ovirt.engine.core.bll.RemoveSnapshotCommand] (ajp-/127.0.0.1:8702-11) [6493ec0c] Lock Acquired to object EngineLock [exclusiveLocks= key: 7bc53ccb-7d10-499f-8827-fd7353743c28 value: VM
# Delete finished
2013-08-01 13:04:03,312 INFO [org.ovirt.engine.core.bll.EntityAsyncTask] (pool-4-thread-48) EntityAsyncTask::HandleEndActionResult [within thread]: EndAction for action type RemoveSnapshot completed, handling the result.
2013-08-01 13:04:03,312 INFO [org.ovirt.engine.core.bll.EntityAsyncTask] (pool-4-thread-48) EntityAsyncTask::HandleEndActionResult [within thread]: EndAction for action type RemoveSnapshot succeeded, clearing tasks.
There is nothing in engene.log for about ~25 min ?
2013-08-01 12:36:34,210 INFO [org.ovirt.engine.core.bll.AsyncTaskManager] (QuartzScheduler_Worker-78) Polling and updating Async Tasks: 1 tasks, 1 tasks to poll now
2013-08-01 12:36:34,285 INFO [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-78) SPMAsyncTask::PollTask: Polling task 726c1ba2-50fd-4cd9-806e-405269bc1d6c (Parent Command RemoveSnapshot, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) returned status running.
2013-08-01 12:36:34,286 INFO [org.ovirt.engine.core.bll.AsyncTaskManager] (QuartzScheduler_Worker-78) Finished polling Tasks, will poll again in 10 seconds.
2013-08-01 13:00:59,261 INFO [org.ovirt.engine.core.bll.DbUserCacheManager] (QuartzScheduler_Worker-23) [685edad4] Start refreshing all users data
2013-08-01 13:01:05,027 INFO [org.ovirt.engine.core.bll.OvfDataUpdater] (QuartzScheduler_Worker-50) [1da29b65] Attempting to update VMs/Templates Ovf.
2013-08-01 13:01:05,029 INFO [org.ovirt.engine.core.bll.OvfDataUpdater] (QuartzScheduler_Worker-50) [1da29b65] Attempting to update VM OVFs in Data Center Default
2013-08-01 13:01:05,055 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.UpdateVMVDSCommand] (QuartzScheduler_Worker-50) [1da29b65] START, UpdateVMVDSCommand( storagePoolId = 286390aa-653b-11e2-b498-377b98031745, ignoreFailoverLimit = false, compatabilityVersion = null, storageDomainId = 00000000-0000-0000-0000-000000000000, infoDictionary.size = 1), log id: 1bc5357d
2013-08-01 13:01:05,112 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.UpdateVMVDSCommand] (QuartzScheduler_Worker-50) [1da29b65] FINISH, UpdateVMVDSCommand, log id: 1bc5357d
This is first time I see engine.log silent for ~25 min :) Is this "normal" behavior or something is (very) wrong with my setup ?
There is no load on CPU / MEM / NET / Storage.
Responses
Please open a support case with relevant vdsm.log and engin.log for the time of snapshot deletion. We will be able to check where it was spending time.
The minimum time to finish the snapshot deletion would the time to merge this 60GB disk to another disk and may be convert it from qcow2 to raw. So if the storage is slow in write operations (like an nfs server on an overloaded RHEL system), it can potentially take this much of time. It all depends on the time it took to copy this 60GB to another disk.
Oh wow.
> The minimum time to finish the snapshot deletion would the time to merge this 60GB disk to
> another disk and may be convert it from qcow2 to raw. So if the storage is slow in write
> operations (like an nfs server on an overloaded RHEL system), it can potentially take this much
> of time. It all depends on the time it took to copy this 60GB to another disk.
Hey Sadique -
If I follow what you're saying, when you delete a RHEV snapshot and all the blocks merge back together again, you end up with one single virtual disk image file, right?
If so, for Jan, even though RHEV snapshot deletion takes much longer than VMware, it might still be an advantage over VMware. I know this seems nuts, but I have a quick story might change your mind.
I was doing some V2Vs a couple years ago, migrating some VMware virtual machines to a RHEV environment. Nothing worked and I was getting more frustrated by the day. The VMware virtual machines had been snapshotted, but all the snapshots had been deleted. Yet the V2V migration was still failing as if the source VM had snapshots. Looking at the VMware storage with my vCenter client, I could see the .VMDK file and all the rest of the files describing that VM. Each virtual disk had its own .VMDK file - but the names looked funny.
The VMware host was a standalone ESXi machine and something told me to look at it from a different point of view. So I booted it from a Fedora Live CD, mounted its filesystem and found those .VMDK files were **NOT** single, individual files. Each virtual disk image really consisted of several little VMDK files. The VMware vCenter client presented each virtual disk as a single file, but looking at the raw directory from my Fedora Live CD, each of them was in fact several little files.
That was why my V2Vs were failing - my V2V expected a single VMDK file, not a bunch of little files linked together by an index somewhere. The cure - from the vCenter client, copy each VMDK disk image to a new directory. This put my copied disk image back into a single file again and fixed my V2V migration. The copy operations took a very long time btw.
So apparently, when you get rid of a VMware snapshot, that snapshot virtual disk file stays in place. It seems every time you make a new snapshot, you create a new virtual disk file and ***it stays around forever***, even after you delete the snapshot. The stuff never merges back into the base image; VMware just presents it as if it does.
There ain't no free lunch - when you delete a snapshot, you can either wait several minutes one time for everything to merge back together again with RHEV, or you can slowly fragment your virtual disk across lots and lots of tiny little files and pay the price a little bit with each new snapshot.
This could get really good. One big-time use for snapshots is disaster recovery prep. You have base images at both sites, then copy the snapshots periodically so the source site and DR site are reasonably close. Let's say you do a snapshot every day and then delete the old snapshot after, say 5 days. After one year with RHEV, you'll have one virtual disk file with 5 snapshot files. I wonder if you really end up with 365 little .VMDK files with VMware?
Think about the fragmentation if you have lots of virtual machines, each with lots of snapshots. This might make for some interesting benchmark tests for the Red Hat marketing department. :)
- Greg
Hey Greg,
I am not promoting VMware. I meaned to say Red Hat has to "fix" this, for the VMware sales department may use it against us Red Hat community as a sales pitch.
I do not use RHEV, but I use qemu/kvm at home and it works fine for me.
Our company is using VMware and I deal with that choice on a daily base.
It is more like: "If we do not complain, software engineering has no means for improvement"
Kind regards,
Jan Gerrit
> If I follow what you're saying, when you delete a RHEV snapshot and all the blocks merge back together again, you end up with one single virtual disk image file, right?
It depends upon how many snapshots you have created. If you have created 3 snapshots, then there will be 4 images/lvs (base image + 3 snaps) in the qcow2 chain. If you delete 2nd snapshot, then second image will be merged to 3rd one before it's removed. So that makes the vm to have 2 snapshots with 3 images/disks.
To get the vm back entirely to a single image, you have to remove all snapshots.
So yes, below statement is 100% correct.
"Let's say you do a snapshot every day and then delete the old snapshot after, say 5 days. After one year with RHEV, you'll have one virtual disk file with 5 snapshot files."
> I am not promoting VMware. I meaned to say Red Hat has to "fix" this,
There is a fix either in 3.2 or coming up in 3.3 to minimize this time by using a bit intelligence while merging images.
How it works now. This is just an example.
- You created a 60GB pre-allocated disk.
- Created a snapshot and written 2 GB of data after creating the snapshot.
Now you have two images in qcow2 chain, one with 60GB and the other with 2GB. When you delete the sanpshot, the entire content of 60GB image will be merged to 2GB image. This requires time to write entire 60GB to the other image. Instead if we merge the content of 2GB image to 60GB disk, it would save us a lot of time. This change is either in 3.2 or coming up in 3.3.
OK, cool.
But based on what I saw, VMware, at least ESXi by itself, apparently does not clean up its snapshot chains at all. Several months after you delete all of the VMware snapshots, you still have a bunch of little files linked together. Seems to me, this would make VMware snapshot deletion fast but hurt overall performance and that could be a RHEV advantage.
- Greg
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
