Linux guests under RHEV 3.0 hang or go to 'Down' status unexpectedly
Issue
-
Guests under RHEV 3.0 unexpectedly hang (stop responding) or crashes and reboots. This can be triggered by a number of circumstances:
- Crash may occur during live migration, crashing just after "setting migration downtime to 300" message
- Crash may occur when attempting to
haltthe guest - Crash may occur at other times
-
The crash may have a backtrace such as:
--- <NMI exception stack> ---
#6 [ffff81031fcf6fd8] iret_label at ffffffff8005d67c
[ NMI exception stack recursion: prior stack location overwritten ]
Environment
- Red Hat Enterprise Virtualization (RHEV-M) 3.0
- RHEV 3.1 and 3.2 may also be affected, but it appears to be much harder to trigger this bug on those versions
- Hypervisor (RHEL or RHEV-H):
kernel-2.6.32-358.el6.x86_64qemu-kvm-0.12.1.2-2.295.el6_3.5
- Linux guest (RHEL 5, RHEL 6, possibly other Linux distros)
- NMI watchdog is enabled in guest
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.