VM migrations stalling and failing to complete in a RHEV environment

Solution Verified - Updated -

Issue

  • Some VM migrations would fail to converge, meaning that pages of memory within the guest were getting modified quicker than they could be transferred to the other host.

  • An indication of this is seen in the VDSM logs, e.g.;

Thread-70::WARNING::2016-01-06 11:11:52,092::migration::468::vm.Vm::(monitor_migration) vmId=`ae7cd74f-79d4-403d-858f-6fb37eb8ee1d`::Migration stalling: remaining (1521MiB) > lowmark (642MiB). Refer to RHBZ#919201.
Thread-70::INFO::2016-01-06 11:11:52,093::migration::477::vm.Vm::(monitor_migration) vmId=`ae7cd74f-79d4-403d-858f-6fb37eb8ee1d`::Migration Progress: 990 seconds elapsed, 82% of data processed
  • Some migrations would still fail regardless of VDSM parameter settings.

  • Even with settings to allow the full network bandwidth to be used, nowhere close to the full bandwidth was being used during these migrations.

Environment

  • Red Hat Enterprise Virtualization (RHEV) 3.5
  • Red Hat Enterprise Linux (RHEL) 6.6 hosts
  • 1 gbit network
  • No separate migration network, the rhevm management network was used

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content