- Red Hat Enterprise Linux 6.2
- KVM host with Intel processor supporting "PAUSE-loop exiting"
- KVM host with a large number of real CPUs
- KVM virtual machines with a large number of restrictively pinned virtual CPUs
A Red Hat Enterprise Linux 6.2 KVM host can become unresponsive or incur a
Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu ...
if it is running virtual machines that have a large number of virtual CPUs pinned restrictively to real CPUs . This issue was originally observed on a host with 80 real CPUs and two virtual machines. Each of the virtual machines was using 40 virtual CPUs that were pinned to separate real CPUs, for example:
virtual CPU 0 of virtual machine A pinned to real CPU 0 virtual CPU 1 of virtual machine A pinned to real CPU 1 ... virtual CPU 38 of virtual machine B pinned to real CPU 78 virtual CPU 39 of virtual machine B pinned to real CPU 79
 In this context, "real CPU" means either a processor core with a single thread or a hyperthread of a processor core.
This issue can only occur on KVM hosts with Intel processors that support the "PAUSE-loop exiting" VM execution control.
This is tracked by RHBZ#827031 (bug not publicly accessible, please contact your Red Hat Support representative if more information is required).
Using a less restrictive pinning of virtual CPUs can work around the issue.
kvm-intelkernel module with the parameter
ple_gap=0disables the "PAUSE-loop exiting" feature. This may be a possible alternative to work around the issue.
Red Hat Enterprise Linux 6.2 introduces a change in the KVM kernel modules that can improve the performance of certain workloads, if the guest kernel utilizes the "pause" processor instruction inside of spinlock loops. A flaw in this change can entail excessive contention at run queue locks in the KVM host kernel. The locks that are affected by the contention pertain to the run queues of those real CPUs that are allocated to a virtual CPU 0 of a guest. Excessive contention can either render the host unresponsive or cause the aforementioned kernel panic.
Review the CPU configuration of the KVM virtual machines to determine whether virtual CPUs are pinned to real CPUs very restrictively.
If a vmcore is available, check if the stack traces of many active threads contain the
kvm_vcpu_on_spin()functions, similar to the following example:
PID: 39297 TASK: ffff881ff02134c0 CPU: 2 COMMAND: "qemu-kvm" ... --- <NMI exception stack> --- #6 [ffff881feeee3b68] _spin_lock at ffffffff814ef341 #7 [ffff881feeee3b70] double_rq_lock at ffffffff810519fc #8 [ffff881feeee3ba0] yield_to at ffffffff814ed2a1 #9 [ffff881feeee3bf0] kvm_vcpu_on_spin at ffffffffa0328494 [kvm] #10 [ffff881feeee3c50] handle_pause at ffffffffa02832ce [kvm_intel] #11 [ffff881feeee3c70] vmx_handle_exit at ffffffffa0283ae1 [kvm_intel] #12 [ffff881feeee3cb0] kvm_arch_vcpu_ioctl_run at ffffffffa033d97d [kvm] ...
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.