limits on the max GFS mounts
Issue
Symptom/Problem
- GFS or GFS2 mount command hangs when limit is reached
- groupd spins in 100% CPU consumption.
Environment
- Red Hat Enterprise Linux 5.3 or lower
- Global File System
- Global File System 2
Diagnostic Steps
- verify if any of the processes is taking 100% of CPU, specifically groupd
- While mount is hung gather:
date ipcs -l ipcs -a ps axwwo user,pid,%cpu,%mem,vsz,rss,wchan=WIDE-WCHAN-COLUMN,stat,start,cputime,comm exit
- gather strace of groupd, gfs_controld, dlm_controld before mounting any GFS filesystems
# strace -f -T -ttt -o groupd.`hostname`.strace -p `pidof groupd` # strace -f -T -ttt -o gfs_controld.`hostname`.strace -p `pidof gfs_controld` # strace -f -T -ttt -o dlm_controld.`hostname`.strace -p `pidof dlm_controld`
- Examine tail of groupd strace and find:
1251829462.024481 semget(0x73652549, 3, IPC_CREAT|IPC_EXCL|0600) = -1 ENOSPC (No space left on device) <0.000010> 1251829462.024515 geteuid() = 0 <0.000012> 1251829462.024546 semget(0x215ddbfd, 3, IPC_CREAT|IPC_EXCL|0600) = -1 ENOSPC (No space left on device) <0.000010> 1251829462.024576 geteuid() = 0 <0.000007> 1251829462.024608 semget(0x50ba4703, 3, IPC_CREAT|IPC_EXCL|0600) = -1 ENOSPC (No space left on device) <0.000016> 1251829462.024644 geteuid() = 0 <0.000007> 1251829462.024671 semget(0x63f92b27, 3, IPC_CREAT|IPC_EXCL|0600) = -1 ENOSPC (No space left on device) <0.000011> 1251829462.024706 geteuid() = 0 <0.000012> 1251829462.024736 semget(0x1dde5104, 3, IPC_CREAT|IPC_EXCL|0600) = -1 ENOSPC (No space left on device) <0.000010> 1251829462.024763 geteuid() = 0 <0.000009>
Resolution
- There is no such limit per se. The limits are in ulimit and ipcs for this system/user.
- There are two components in this case:
- Look at 'ipcs -s' and see if the node is already using 128 semaphore arrays, which is the default maximum, explaining the blockage.
-
Be sure this CMAN semaphore leak errata is installed for BZ505594 - RHSA-2009:1341 -
cman-2.0.115-1.el5
or later -
With
cman-2.0.115-1.el5
(or later), if the problem still persists, it'll be necessary to resize the semaphore array pool (with care).-
The defaults /proc/sys/kernel/sem fields are (in order):
- max semaphores per array (SEMMSL) = 250
- max semaphores system wide (SEMMNS) = 32000
- max ops per semop call (SEMOPM) = 32
- max number of arrays (SEMMNI)= 128
- GFS requires 3 semephores per mount in addition to those required by device mapper
- Certain other processes on the system also need semaphores
-
-
The ENOSPC error code from semget indicates
SEMMNS
andSEMMNI
are being reached (the second and fourth numbers) SEMMNS
should be equal to or greater thanSEMMNI * SEMMSL
- If increasing
SEMMNS
andSEMMNI
is not having an effect, you may also check root's ulimit -n setting which also could be too low.
Root Cause
- Bug on cman with semaphore leak.
- Behavior of cman/openais exhausting the number of semaphore on the cluster when there are too many dlm resources.
- Limit on max number of open files for root
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.