RHEL 5.7 huge load average
Hello,
After last kernel update I faced big load average, more than 220 for a server with 16 virtual CPU, 16GB RAM, 4 GB swap while just half of RAM is consumed and none swap.
Server mail role is sendmail and ps aux |egrep "D|Ds" shows more than 170 sendmail process. Normally server manage ~100000 emails/hour and before update, same amount of emails rised load to max. ~80.
Now ps alx show wchan as "-" so there is no clue related with what I/O are process waiting for ( i dont how it was before as i had no that problem)
Actual kernel version is kernel-2.6.18-371.8.1.el5.
I may add that no changes in configurations, mailqueue size or traffic occured.
Any advice is welcome.
Thank you
Responses
You could check what your IO scheduler is. It may perform better if you switch from cfq to deadline.
cat /sys/block/
The one inside [ ] is currently selected.
To update use:
echo sched_name > /sys/block/
Or append elevator=deadline to the kernel line in grub.conf
See this article for details:
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Tuning_and_Optimizing_Red_Hat_Enterprise_Linux_for_Oracle_9i_and_10g_Databases/sect-Oracle_9i_and_10g_Tuning_Guide-Kernel_Boot_Parameters-The_IO_Scheduler.html
Though this doesn't answer why the load has increased, it may well help.
The post has stripped the disk name from the above cat and echo commands. Should be:
cat /sys/block/sda/queue/scheduler or cat /sys/block/vda/queue/scheduler
echo sched_name > /sys/block/sda/queue/scheduler or echo sched_name > /sys/block/vda/queue/scheduler
Also - sendmail typically warrants specific mount options if your system is particularly demanding...
I believe relatime is an accepted practice for /var/spool (some question whether to use noatime) - I, personally, don't know enough about this topic any more ;-)
There is a book "Sendmail Performance Tuning" By Nick Christenson which talked a bit about the mount options, etc... (page 40).
The whole book might be available on Google Books, but you should be able to find the selection I am referring to by Googling "sendmail performance /etc/fstab"
These types of problems are certainly fun (seemingly random without a good identifier as to what the issue might be ;-)
I had ran in to a similar issue a while back, also on RHEL 5, with the NAMED service on the box. I believe POSIX permissions and SElinux had been altered by something and named was consuming an entire processor.
I would recommend opening a case to have Red Hat dig up the possible related causes for you.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
