High load average, crmd reports "High CPU load" increasingly as time passes, hung-task backtraces stuck in XFS calls in the logs, lvm commands become blocked, and/or corosync using 100% CPU in a RHEL 7 High Availability cluster
Issue
- We found our server with hundreds or even thousands of stuck
netstatandnetstatcommands and a load average in the thousands, but very little CPU being used - Processes are getting stuck waiting in the XFS slab shrinker
- We've detected high load on a cluster node and couldn't log in to the system. It was still a member but was unresponsive on the console or over ssh
corosyncis using 100% CPU on only one node, load average is very high, and many processes likepsandnetstatseem to be stuck- While
corosyncseems to be hogging an entire CPU, there are hung-task warnings in/var/log/messagesshowing processes stuck waiting in XFS functions - Why is
corosyncutilizing so much CPU on one of my nodes? - We had applications get stuck after processes hung waiting on something, and captured a vmcore. A number of processes are stuck waiting in
xfs_fs_free_cached_objects - We are frequently seeing
lvmcommands block andLVMresource operations time out. corosyncandclvmdboth spin away with 100% CPU on one node in the cluster
Environment
- Red Hat Enterprise Linux (RHEL) 7 with the High Availability Add-On
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
