VM migrations stalling and failing to complete in a RHEV environment
Issue
-
Some VM migrations would fail to converge, meaning that pages of memory within the guest were getting modified quicker than they could be transferred to the other host.
-
An indication of this is seen in the VDSM logs, e.g.;
Thread-70::WARNING::2016-01-06 11:11:52,092::migration::468::vm.Vm::(monitor_migration) vmId=`ae7cd74f-79d4-403d-858f-6fb37eb8ee1d`::Migration stalling: remaining (1521MiB) > lowmark (642MiB). Refer to RHBZ#919201.
Thread-70::INFO::2016-01-06 11:11:52,093::migration::477::vm.Vm::(monitor_migration) vmId=`ae7cd74f-79d4-403d-858f-6fb37eb8ee1d`::Migration Progress: 990 seconds elapsed, 82% of data processed
-
Some migrations would still fail regardless of VDSM parameter settings.
-
Even with settings to allow the full network bandwidth to be used, nowhere close to the full bandwidth was being used during these migrations.
Environment
- Red Hat Enterprise Virtualization (RHEV) 3.5
- Red Hat Enterprise Linux (RHEL) 6.6 hosts
- 1 gbit network
- No separate migration network, the
rhevm
management network was used
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.