KVM causes process lockups
I have got a SuperMicro X9DRI-LN4F+_R1.2A server with an Adaptec 71605 raid controller (softlayer). I am using KVM virtualisation, running 8 VMs. For the storage, the VMs are using file base images (raw format). The machine is running RHEL 6.5 (up to date).
The problem is that sometime when copying large files (for instance the images of 32GB) causes a process to freeze (100% CPU). When killing the process using kill -9 17492, no error is reported but the process cannot be killed. The process only recovers when all VMs are shutdown.
Mostly it causes the processed executed to be locked (100% cpu), but I have occasional situations where some other process freezes. I currently have a locked up sshd process.
top - 14:27:35 up 2 days, 7:15, 1 user, load average: 4.40, 3.36, 3.35
Tasks: 478 total, 2 running, 476 sleeping, 0 stopped, 0 zombie
Cpu(s): 17.6%us, 4.7%sy, 0.0%ni, 77.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 32844184k total, 32048980k used, 795204k free, 77428k buffers
Swap: 4194296k total, 0k used, 4194296k free, 9479800k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8356 qemu 20 0 5437m 4.0g 5116 S 396.6 12.7 62:44.72 qemu-kvm
17492 root 20 0 70024 3408 2612 R 100.0 0.0 802:44.09 sshd
3154 qemu 20 0 6730m 5.9g 5096 S 10.3 18.9 198:04.79 qemu-kvm
3055 qemu 20 0 2648m 2.0g 5524 S 5.6 6.4 463:21.61 qemu-kvm
2497 root 20 0 1003m 15m 5348 S 5.3 0.0 13:31.13 libvirtd
I came across this post from 2004 with a similar issue:
http://www.webhostingtalk.com/showthread.php?t=1273964
Is this something that can be resolved other than replacing the raid controller? The firmware of the controller is already up to date.
Responses
Sounds like a tough issue to nail down.
Are you using LVM for the images (I assume /var/lib/libvirt/images)?
Do you use any special mount options for your images (or is the directory just on /)?
Was that directory setup at server build, or after (on separate LUNs)?
I have run in to issues with partition alignment on SAN LUNs (not sure if that would translate to a local RAID controller in a similar way).
Also - did you review this:
http://download.adaptec.com/pdfs/readme/series-7-8-controller_readme_12_2013.pdf
This guy has some good info (you'll have to sort through it)
http://wiki.mikejung.biz/index.php?title=Hardware
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
