Too many files open causing HTTP in Oracle App Server to hang.
Hello all,
This is my first post, so please forgive any mistakes in protocol.
We have OAS (Oracel App Server) 10g running on RHEL 5.9. Recently the OAS's HTTP server hung and had to be restarted. The logs indicated "Too many open file", but didn't give a number that was reached.
OAS is running under user oracle. The limits are all set to default (i.e. no modifications to /etc/security/limits.conf).
ulimit -Hn and ulimit -Sn for user oracle both return 1024.
I figured the user oracle must have tried to open more than 1024 files and couldn't so the HTTP server hung.
Unfortunately another member of the team restarted OAS to resolve the problem before I could verify how many files oracle had open at the time.
The confusion I have is, now when I check for the number of open files for user oracle, I get 3393.
[root@xxxxx]# lsof -u oracle |wc
3393 30895 444349
How can user oracle have more then 1024 files open?
I ran the following command to print the number of open files per PID for user oracle.
[root@xxxxx]# lsof -u oracle |awk '{print $2":"$1}' |sort |uniq -c | sort -n
1 PID:COMMAND
15 7554:opmn
22 7613:rotatelog
24 30019:perl
27 7616:rotatelog
27 7617:rotatelog
28 7615:rotatelog
39 7555:opmn
73 7577:java
94 7614:httpd
103 7576:httpd
104 7618:httpd
106 7627:httpd
106 8284:httpd
107 30307:httpd
107 5357:httpd
107 5360:httpd
107 5363:httpd
107 5370:httpd
107 5373:httpd
107 7625:httpd
107 7631:httpd
107 7634:httpd
107 8267:httpd
107 8278:httpd
107 8281:httpd
107 8293:httpd
108 7620:httpd
108 7824:httpd
108 8274:httpd
108 8286:httpd
108 8290:httpd
174 18986:java
255 7578:java
718 7579:java
I'm wondering if the open files limit of 1024 is a "per user" limit or actually a "per user/per PID" limit.
Although the total number of files opened by user oracle is well above 1024, the most open by any PID owned by oracle is 718 for PID 7579.
Please advise.
Thank you
Responses
Mirza,
Have you tried increasing the open file limit in /etc/security/limits.conf? you mention they are default, is there a reason you are leaving them default?
Depending on Oracle version I believe the recommendations differ slightly. This is the example for nofile from Oracle documentation for 11g installation (http://docs.oracle.com/cd/B28359_01/install.111/b32281/toc.htm#LADQI101):
oracle soft nofile 1024
oracle hard nofile 65536
There are likely similar recommendations for the user that httpd runs as in the installation documentation to resolve this issue.
So after re-reading the question I realise it's not so much about solving the problem as it is about understanding the limits configuration, apologies if my answer was off the mark.
Your question led me on a bit of a research expedition...
The limits.conf man page states the following (man limits.conf):
Also, please note that all limit settings are set per login. They are not global, nor are they permanent; existing only for the duration of the session.
From additional reading I followed the trail that ulimit uses the setrlimit system call (man setrlimit) using the resource RLIMIT_NOFILE in this case.
RLIMIT_NOFILE
Specifies a value one greater than the maximum file descriptor number that can be opened by this process.
This section of the man page is also interesting:
NOTES
A child process created via fork(2) inherits its parent’s resource limits. Resource limits are preserved across execve(2). One can set the resource limits of the shell using the built-in ulimit command (limit in csh(1)). The shell’s resource limits are inherited by the processes that it creates to execute commands.
Hope this answer is closer to the mark!
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
