High load average, crmd reports "High CPU load" increasingly as time passes, hung-task backtraces stuck in XFS calls in the logs, lvm commands become blocked, and/or corosync using 100% CPU in a RHEL 7 High Availability cluster

Solution Unverified - Updated -

Issue

  • We found our server with hundreds or even thousands of stuck netstat and netstat commands and a load average in the thousands, but very little CPU being used
  • Processes are getting stuck waiting in the XFS slab shrinker
  • We've detected high load on a cluster node and couldn't log in to the system. It was still a member but was unresponsive on the console or over ssh
  • corosync is using 100% CPU on only one node, load average is very high, and many processes like ps and netstat seem to be stuck
  • While corosync seems to be hogging an entire CPU, there are hung-task warnings in /var/log/messages showing processes stuck waiting in XFS functions
  • Why is corosync utilizing so much CPU on one of my nodes?
  • We had applications get stuck after processes hung waiting on something, and captured a vmcore. A number of processes are stuck waiting in xfs_fs_free_cached_objects
  • We are frequently seeing lvm commands block and LVM resource operations time out.
  • corosync and clvmd both spin away with 100% CPU on one node in the cluster

Environment

  • Red Hat Enterprise Linux (RHEL) 7 with the High Availability Add-On

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In
Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.