Normal user is unable to login on the system with an error "fork: Resource temporarily unavailable"

Solution Unverified - Updated -

Environment

  • Red Hat Enterprise Linux 6

Issue

  • The su - <user> command failed with an error "Resource temporarily unavailable".
  • A vmcore file is captured during this issue to determine the root cause.
# su – <user>
-bash: fork: retry: Resource temporarily unavailable
-bash: fork: retry: Resource temporarily unavailable
-bash: fork: retry: Resource temporarily unavailable
-bash: fork: retry: Resource temporarily unavailable
-bash: fork: Resource temporarily unavailable

Resolution

  • Increase the value of "nproc" parameter for user or all user's in /etc/security/limits.d/90-nproc.conf
    > Here is an example of /etc/security/limits.d/90-nproc.conf file.
<user>       -          nproc     2048      <<<----[ Only for "<user>" user ]
*          -          nproc     2048      <<<----[ For all user's ]

NOTE: Above is an snippet of example, nproc value should be increased from current value to double or triple as per requirements.

Root Cause

  • The system was not able to create new process(es), because of the limits set for nproc in /etc/security/limits.conf file.
  • The process(es) initiated by user "test" having uid (702638) are reached to it's soft limit.
  • The soft limit for number of process(es) (nproc) is set to 1024 in /etc/security/limits.conf file.
  • The total number of process(es) running on this system with uid (702638) are 1023 process(es).

Diagnostic Steps

  • Determine the total number of process(es) on the system.
crash> ps | wc -l
1370         <<<-----[ Total number of process(es) on the system ]
  • Determine the process name with the highest number of instances.
crash>  ps | gawk '{count[$NF]++}END{for(j in count) print ""count[j]":",j}'|sort -rn|head -n20 
1021: java   <<<-----[ Total 1021 are "java" process(es) ]
64: console-kit-dae
57: klzagent
56: kloagent
7: [kdmflush]
7: [ext4-dio-unwrit]
7: avagent.bin
6: multipathd
6: mingetty
6: elxhbamgrd
5: .vasd
4: rscd
4: collect
4: automount
3: udevd
3: sshd
3: sh
3: rsyslogd
2: sleep
2: sendmail
  • Determine the "NPROC" limit of the process with highest number of instances (i.e "java").
crash> set 32724
    PID: 32724
COMMAND: "java"
   TASK: ffff8801298ce040  [THREAD_INFO: ffff8801030b2000]
    CPU: 0
  STATE: TASK_INTERRUPTIBLE 

crash> ps -r 32724
PID: 32724  TASK: ffff8801298ce040  CPU: 0   COMMAND: "java"
      RLIMIT     CURRENT       MAXIMUM  
         CPU   (unlimited)   (unlimited)
       FSIZE   (unlimited)   (unlimited)
        DATA   (unlimited)   (unlimited)
       STACK    10485760     (unlimited)
        CORE        0        (unlimited)
         RSS   (unlimited)   (unlimited)
       NPROC      1024          30527     
      NOFILE      8192          8192    
     MEMLOCK      65536         65536   
          AS   (unlimited)   (unlimited)
       LOCKS   (unlimited)   (unlimited)
  SIGPENDING      30527         30527   
    MSGQUEUE     819200        819200   
        NICE        0             0     
      RTPRIO        0             0     
      RTTIME   (unlimited)   (unlimited)

crash> ps -r | grep -e 'NPROC      1024' -B 8 | grep -e PID -e NPROC
PID: 334    TASK: ffff88012c8a4080  CPU: 0   COMMAND: "java"
       NPROC      1024          30527   
PID: 335    TASK: ffff880101e1c040  CPU: 0   COMMAND: "java"
       NPROC      1024          30527   
PID: 336    TASK: ffff88010a390aa0  CPU: 0   COMMAND: "java"
       NPROC      1024          30527   
PID: 453    TASK: ffff8801298e7540  CPU: 0   COMMAND: "java"
       NPROC      1024          30527   
[..]
  • Determine the "UID" and "GID" of the process with highest number of instances( i.e "java").
crash> set 32724
    PID: 32724
COMMAND: "java"
   TASK: ffff8801298ce040  [THREAD_INFO: ffff8801030b2000]
    CPU: 0
  STATE: TASK_INTERRUPTIBLE 

crash> task_struct.real_cred ffff8801298ce040
  real_cred = 0xffff880139eeb300

crash> cred.uid 0xffff880139eeb300
  uid = 702638               <<<----[ User ID  ]

crash> cred.gid 0xffff880139eeb300
  gid = 626431               <<<----[ Group ID ]
  • The "java" process(es) are running with uid (702638) and gid (9626431).

  • Determine the total number of process(es) running with uid (702638).

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

10 Comments

Dear Sir,

As suggested by application team we restarted server and bounce application,and application in working fine. Please suggest specific solution , if required i can upload screenshot the limits.conf file.

I made that change and it worked on RedHat 6. On RedHat 5 I see the directory but it is empty. Should I add that file to those servers with the higher limit to prevent a similar occurrence happening on them?

Jerry,

No, It is not require to create that file on RHEL5. You can just add/increase nproc value on /etc/security/limits.conf file on RHEL5 and it works. Setting nproc in /etc/security/limits.conf has no effect in Red Hat Enterprise Linux 6.

Refer this article - https://access.redhat.com/solutions/146233

OK. Now I am confused because that file was on RedHat 6 and increasing it fixed the problem
On Redhat 5 the directory is there but no file. My question is should I add that file with the higher value or leave it as is?

I have checked the system with ps -ef | wc -l , but the number was far to a small. But after modifying /etc/security/limits.d/90-nproc.conf ,and the problem was fixed .

thx for solution! worked great for us!

Hi, team

May I know, how you got (set 32724) id 32724.am talking about below step crash> set 32724 PID: 32724 COMMAND: "java" TASK: ffff8801298ce040 [THREAD_INFO: ffff8801030b2000] CPU: 0 STATE: TASK_INTERRUPTIBLE

As suggested by application team we restarted server and bounce application,and application in working fine. Please provide the permanent solution for this issue.

I restarted server and bounce application and application in working fine, after some weeks/months is happening again. Also to mention the limits were changed as well, still happening I would say once a month.. Please provide the permanent solution for this issue. thank you in advance!

On RHEL6 I had to also add session required pam_limits.so to /etc/pam.d/sshd so that limits are applied for SSH sessions. Just FYI.