KVM causes process lockups

Latest response

I have got a SuperMicro X9DRI-LN4F+_R1.2A server with an Adaptec 71605 raid controller (softlayer). I am using KVM virtualisation, running 8 VMs. For the storage, the VMs are using file base images (raw format). The machine is running RHEL 6.5 (up to date).

The problem is that sometime when copying large files (for instance the images of 32GB) causes a process to freeze (100% CPU). When killing the process using kill -9 17492, no error is reported but the process cannot be killed. The process only recovers when all VMs are shutdown.

Mostly it causes the processed executed to be locked (100% cpu), but I have occasional situations where some other process freezes. I currently have a locked up sshd process.

top - 14:27:35 up 2 days,  7:15,  1 user,  load average: 4.40, 3.36, 3.35
Tasks: 478 total,   2 running, 476 sleeping,   0 stopped,   0 zombie
Cpu(s): 17.6%us,  4.7%sy,  0.0%ni, 77.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  32844184k total, 32048980k used,   795204k free,    77428k buffers
Swap:  4194296k total,        0k used,  4194296k free,  9479800k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 8356 qemu      20   0 5437m 4.0g 5116 S 396.6 12.7  62:44.72 qemu-kvm
17492 root      20   0 70024 3408 2612 R 100.0  0.0 802:44.09 sshd
 3154 qemu      20   0 6730m 5.9g 5096 S 10.3 18.9 198:04.79 qemu-kvm
 3055 qemu      20   0 2648m 2.0g 5524 S  5.6  6.4 463:21.61 qemu-kvm
 2497 root      20   0 1003m  15m 5348 S  5.3  0.0  13:31.13 libvirtd

I came across this post from 2004 with a similar issue:

http://www.webhostingtalk.com/showthread.php?t=1273964

Is this something that can be resolved other than replacing the raid controller? The firmware of the controller is already up to date.

Responses