limits on the max GFS mounts

Solution Verified - Updated -

Issue

Symptom/Problem

  • GFS or GFS2 mount command hangs when limit is reached
  • groupd spins in 100% CPU consumption.

Environment

  • Red Hat Enterprise Linux 5.3 or lower
  • Global File System
  • Global File System 2

Diagnostic Steps

  • verify if any of the processes is taking 100% of CPU, specifically groupd
  • While mount is hung gather:
    date
    ipcs -l
    ipcs -a
    ps axwwo user,pid,%cpu,%mem,vsz,rss,wchan=WIDE-WCHAN-COLUMN,stat,start,cputime,comm
    exit
    
  • gather strace of groupd, gfs_controld, dlm_controld before mounting any GFS filesystems
    # strace -f -T -ttt -o groupd.`hostname`.strace -p `pidof groupd`
    # strace -f -T -ttt -o gfs_controld.`hostname`.strace -p `pidof gfs_controld`
    # strace -f -T -ttt -o dlm_controld.`hostname`.strace -p `pidof dlm_controld`
    
  • Examine tail of groupd strace and find:
    1251829462.024481 semget(0x73652549, 3, IPC_CREAT|IPC_EXCL|0600) = -1 ENOSPC (No space left on device) <0.000010>
    1251829462.024515 geteuid()            = 0 <0.000012>
    1251829462.024546 semget(0x215ddbfd, 3, IPC_CREAT|IPC_EXCL|0600) = -1 ENOSPC (No space left on device) <0.000010>
    1251829462.024576 geteuid()            = 0 <0.000007>
    1251829462.024608 semget(0x50ba4703, 3, IPC_CREAT|IPC_EXCL|0600) = -1 ENOSPC (No space left on device) <0.000016>
    1251829462.024644 geteuid()            = 0 <0.000007>
    1251829462.024671 semget(0x63f92b27, 3, IPC_CREAT|IPC_EXCL|0600) = -1 ENOSPC (No space left on device) <0.000011>
    1251829462.024706 geteuid()            = 0 <0.000012>
    1251829462.024736 semget(0x1dde5104, 3, IPC_CREAT|IPC_EXCL|0600) = -1 ENOSPC (No space left on device) <0.000010>
    1251829462.024763 geteuid()            = 0 <0.000009>
    

Resolution

  • There is no such limit per se. The limits are in ulimit and ipcs for this system/user.
  • There are two components in this case:
  1. Look at 'ipcs -s' and see if the node is already using 128  semaphore arrays, which is the default maximum, explaining the blockage.
  2. Be sure this CMAN semaphore leak errata is installed for BZ505594 - RHSA-2009:1341 - cman-2.0.115-1.el5 or later

  3. With cman-2.0.115-1.el5 (or later), if the problem still persists, it'll be necessary to resize the semaphore array pool (with care).

    • The defaults /proc/sys/kernel/sem fields are (in order):

      • max semaphores per array (SEMMSL) = 250
      • max semaphores system wide (SEMMNS) = 32000
      • max ops per semop call (SEMOPM) = 32
      • max number of arrays (SEMMNI)= 128
    • GFS requires 3 semephores per mount in addition to those required by device mapper
    • Certain other processes on the system also need semaphores
  4. The ENOSPC error code from semget indicates SEMMNS and SEMMNI are being reached (the second and fourth numbers)

  5. SEMMNS should be equal to or greater than SEMMNI * SEMMSL
  6. If increasing SEMMNS and SEMMNI is not having an effect, you may also check root's ulimit -n setting which also could be too low.

Root Cause

  1. Bug on cman with semaphore leak.
  2. Behavior of cman/openais exhausting the number of semaphore on the cluster when there are too many dlm resources.
  3. Limit on max number of open files for root

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content