System hang due to processes being starved by the CFS scheduler

Solution Unverified - Updated -

Issue

  • Unexpected reboots for 2 production modules
  • Two systems at a critical production site running RHEL6U5, encountered unexpected reboots within a space of 1 hour.
  • Both production systems have been running in stable state for around 3 months.
  • Both Systems seem to have been running with normal load pattern until 15 minutes prior to incident
  • Within few minutes, there was sudden increase of load by all application related java processes, and load average went from 1+ to 200+
  • BMC Watchdog timer did not get periodic heartbeats for 120 seconds from OS, and hence an NMI was sent from BMC to Host OS.
  • Evaluation of possible abnormal network traffic and/or application issues as the cause of sudden load increase is still in progress.

Environment

  • Red Hat Enterprise Linux 6.5
  • CFS scheduler
  • Software RAID storage configuration using md raid1

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In
Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.