Skip to navigation

GFS2 filesystems intermittently hang and glock_workqueue processes use 100% CPU in RHEL 5

Updated 2013-11-08T16:44:38+00:00

Issue

  • When a large number of cached glocks are built up for GFS2, and memory pressure causes a flush of cache, the CPU utilization of glock_workqueue becemes very high, possibly causing the system to become unresponsive.
  • When page cached is flushed out, the CPU utilization of glock_workqueue spikes
  • We are seeing high load on our clusters. This high load is due to CPU utilization. There is an associated large drop in page cache for the GFS2 filesystem during the high load. When the large pagecache drop completes, the load goes back down to normal. The glock_workqueue processes are increased during this high load time period. Very little I/O occurs during the high load event.
  • A hung process was halted, and during that time the system stopped functioning for all existing users. A glock service spiked to 100% CPU, you were not able to start any new ssh sessions and it kicked existing ssh users off. The symptom went away by the time you we were able to gain access... We saw a huge spike in load and CPU usage, but I/O was at normal levels.  Multiple instances of glock_workqueue using 100% cpu.

Environment

  • Red Hat Enterprise Linux 5 (RHEL5) with the Resilient Storage Add On
    • Observed in RHEL 5 Update 7 - Update 9
  • GFS2 file system(s) mounted on multiple nodes
  • Often triggered by a backup utility running against GFS2 file system(s), such as Symantec NetBackup

Subscriber content preview. For full access to the Red Hat Knowledgebase, please log in.

Not a subscriber? Learn more about the benefits of Red Hat Subscriptions.