Normal user is unable to login on the system with an error "fork: Resource temporarily unavailable"
Environment
- Red Hat Enterprise Linux 6
Issue
- The
su - <user>
command failed with an error "Resource temporarily unavailable". - A vmcore file is captured during this issue to determine the root cause.
# su – <user>
-bash: fork: retry: Resource temporarily unavailable
-bash: fork: retry: Resource temporarily unavailable
-bash: fork: retry: Resource temporarily unavailable
-bash: fork: retry: Resource temporarily unavailable
-bash: fork: Resource temporarily unavailable
Resolution
- Increase the value of "nproc" parameter for
user or all user's in /etc/security/limits.d/90-nproc.conf
> Here is an example of/etc/security/limits.d/90-nproc.conf
file.
<user> - nproc 2048 <<<----[ Only for "<user>" user ]
* - nproc 2048 <<<----[ For all user's ]
NOTE: Above is an snippet of example, nproc value should be increased from current value to double or triple as per requirements.
Root Cause
- The system was not able to create new process(es), because of the limits set for nproc in /etc/security/limits.conf file.
- The process(es) initiated by user "test" having uid (702638) are reached to it's soft limit.
- The soft limit for number of process(es) (nproc) is set to 1024 in /etc/security/limits.conf file.
- The total number of process(es) running on this system with uid (702638) are 1023 process(es).
Diagnostic Steps
- Determine the total number of process(es) on the system.
crash> ps | wc -l
1370 <<<-----[ Total number of process(es) on the system ]
- Determine the process name with the highest number of instances.
crash> ps | gawk '{count[$NF]++}END{for(j in count) print ""count[j]":",j}'|sort -rn|head -n20
1021: java <<<-----[ Total 1021 are "java" process(es) ]
64: console-kit-dae
57: klzagent
56: kloagent
7: [kdmflush]
7: [ext4-dio-unwrit]
7: avagent.bin
6: multipathd
6: mingetty
6: elxhbamgrd
5: .vasd
4: rscd
4: collect
4: automount
3: udevd
3: sshd
3: sh
3: rsyslogd
2: sleep
2: sendmail
- Determine the "NPROC" limit of the process with highest number of instances (i.e "java").
crash> set 32724
PID: 32724
COMMAND: "java"
TASK: ffff8801298ce040 [THREAD_INFO: ffff8801030b2000]
CPU: 0
STATE: TASK_INTERRUPTIBLE
crash> ps -r 32724
PID: 32724 TASK: ffff8801298ce040 CPU: 0 COMMAND: "java"
RLIMIT CURRENT MAXIMUM
CPU (unlimited) (unlimited)
FSIZE (unlimited) (unlimited)
DATA (unlimited) (unlimited)
STACK 10485760 (unlimited)
CORE 0 (unlimited)
RSS (unlimited) (unlimited)
NPROC 1024 30527
NOFILE 8192 8192
MEMLOCK 65536 65536
AS (unlimited) (unlimited)
LOCKS (unlimited) (unlimited)
SIGPENDING 30527 30527
MSGQUEUE 819200 819200
NICE 0 0
RTPRIO 0 0
RTTIME (unlimited) (unlimited)
crash> ps -r | grep -e 'NPROC 1024' -B 8 | grep -e PID -e NPROC
PID: 334 TASK: ffff88012c8a4080 CPU: 0 COMMAND: "java"
NPROC 1024 30527
PID: 335 TASK: ffff880101e1c040 CPU: 0 COMMAND: "java"
NPROC 1024 30527
PID: 336 TASK: ffff88010a390aa0 CPU: 0 COMMAND: "java"
NPROC 1024 30527
PID: 453 TASK: ffff8801298e7540 CPU: 0 COMMAND: "java"
NPROC 1024 30527
[..]
- Determine the "UID" and "GID" of the process with highest number of instances( i.e "java").
crash> set 32724
PID: 32724
COMMAND: "java"
TASK: ffff8801298ce040 [THREAD_INFO: ffff8801030b2000]
CPU: 0
STATE: TASK_INTERRUPTIBLE
crash> task_struct.real_cred ffff8801298ce040
real_cred = 0xffff880139eeb300
crash> cred.uid 0xffff880139eeb300
uid = 702638 <<<----[ User ID ]
crash> cred.gid 0xffff880139eeb300
gid = 626431 <<<----[ Group ID ]
-
The "java" process(es) are running with uid (702638) and gid (9626431).
-
Determine the total number of process(es) running with uid (702638).
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
10 Comments
Dear Sir,
As suggested by application team we restarted server and bounce application,and application in working fine. Please suggest specific solution , if required i can upload screenshot the limits.conf file.
I made that change and it worked on RedHat 6. On RedHat 5 I see the directory but it is empty. Should I add that file to those servers with the higher limit to prevent a similar occurrence happening on them?
Jerry,
No, It is not require to create that file on RHEL5. You can just add/increase nproc value on /etc/security/limits.conf file on RHEL5 and it works. Setting nproc in /etc/security/limits.conf has no effect in Red Hat Enterprise Linux 6.
Refer this article - https://access.redhat.com/solutions/146233
OK. Now I am confused because that file was on RedHat 6 and increasing it fixed the problem
On Redhat 5 the directory is there but no file. My question is should I add that file with the higher value or leave it as is?
I have checked the system with ps -ef | wc -l , but the number was far to a small. But after modifying /etc/security/limits.d/90-nproc.conf ,and the problem was fixed .
thx for solution! worked great for us!
Hi, team
May I know, how you got (set 32724) id 32724.am talking about below step crash> set 32724 PID: 32724 COMMAND: "java" TASK: ffff8801298ce040 [THREAD_INFO: ffff8801030b2000] CPU: 0 STATE: TASK_INTERRUPTIBLE
As suggested by application team we restarted server and bounce application,and application in working fine. Please provide the permanent solution for this issue.
I restarted server and bounce application and application in working fine, after some weeks/months is happening again. Also to mention the limits were changed as well, still happening I would say once a month.. Please provide the permanent solution for this issue. thank you in advance!
On RHEL6 I had to also add
session required pam_limits.so
to/etc/pam.d/sshd
so that limits are applied for SSH sessions. Just FYI.