GFS2 performance degrades for multi-node/grid processing over several days in RHEL 5

Solution Unverified - Updated -

Issue

  • Batch processing jobs in the IIS application running on GFS2 hung, production cluster had to be restarted
  • Nodes processing data on GFS2 perform fine after reboot, but day after day the performance gets worse
  • I have one node in the cluster that interacts with a large number of files over time through backups and maintenance scripts, and other nodes process the same data frequently. The processing performance on the other nodes gets worse and worse over time
  • When my nodes start processing batch jobs, one node shows very high CPU usage from dlm_recv and processing on the other nodes is very slow

Environment

  • Red Hat Enterprise Linux (RHEL) 5 Update 10 with the Resilient Storage Add On
  • GFS2
    • File systems used for grid or batch type processing jobs
    • File systems are accessed heavily by one node, touching a large number of the files that will be later used by other nodes (such as in backup jobs, maintenance scripts, etc)
    • Issues such as this are often more likely to occur in larger clusters or 4 or more all heavily accessing the same file systems

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content