VMs become non-responsive during a Live Merge

Solution In Progress - Updated -

Issue

  • VMs become non-responsive during a Live Merge (snapshot deletion).

  • During this time they cannot be accessed remotely, e.g. via ssh, however ping is successful.

  • This may last a few minutes, however has been seen to be much longer.

  • The vdsm logs show timeouts, e.g.

Thread-1108850::ERROR::2016-05-31 18:05:43,221::utils::739::root::(wrapper) Unhandled exception
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 736, in wrapper
    return f(*a, **kw)
  File "/usr/share/vdsm/virt/vm.py", line 5262, in run
    self.tryPivot()
  File "/usr/share/vdsm/virt/vm.py", line 5231, in tryPivot
    ret = self.vm._dom.blockJobAbort(self.drive.name, flags)
  File "/usr/share/vdsm/virt/virdomain.py", line 76, in f
    raise toe
TimeoutError: Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainBlockJobAbort)
  • Attempting to check the blockjob status also times out;
# virsh -r blockjob VM-A vda
error: Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainBlockJobAbort)
  • As do other attempts to access via the qemu monitor;
# virsh -r list
 Id    Name                           State
----------------------------------------------------
 10    VM-A                      running

# virsh qemu-monitor-command --hmp 10 info status
Please enter your authentication name: vdsm@rhevh
Please enter your password:
error: Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainBlockJobAbort)
  • pstack shows one of the threads of the qemu-kvm process with the following stack frames;
Thread 1 (Thread 0x7ffa76fdec40 (LWP 118925)):
#0  0x00007ffa772cdf9c in aio_bh_poll ()
#1  0x00007ffa772dd039 in aio_dispatch_clients ()
#2  0x00007ffa772dd58b in aio_poll_clients ()
#3  0x00007ffa772d19c4 in bdrv_drain_one ()
#4  0x00007ffa772d2c14 in bdrv_drain_all ()
#5  0x00007ffa772d842f in bdrv_close ()
#6  0x00007ffa772d86b7 in bdrv_unref ()
#7  0x00007ffa77310213 in mirror_exit ()
#8  0x00007ffa772db26c in block_job_defer_to_main_loop_bh ()
#9  0x00007ffa772cdfc4 in aio_bh_poll ()
#10 0x00007ffa772dd039 in aio_dispatch_clients ()
#11 0x00007ffa772cde3e in aio_ctx_dispatch ()
#12 0x00007ffa7551379a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#13 0x00007ffa772dbeb8 in main_loop_wait ()
#14 0x00007ffa770dac0e in main ()

Environment

  • Red Hat Enterprise Virtualization (RHEV) 3.5, 3.6

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content