HTTP 503 error repeatedly causing website to crash on RHEL 5.10
Hello,
We have a Coldfusion 9 website sitting on a server that is running RHEL 5.10 and Apache 2.2.3. The website accesses a Postgresql 8.4 database from just a few of its pages; the majority of the website pages are static. We've never had any trouble before last Tuesday, when the website suddenly started crashing and giving 503 errors. We had to restart the coldfusion and apache services and then it would come back up, but would crash again in 1-3 hours. We at first thought it was a coldfusion issue, but there were no errors in any of the CF logs, but a lot of 503 errors in the httpd access_log. We noticed that the majority of the 503 errors were on the 2 pages of the website the do the most database access. Some discussion with the server host company suggested our apache was not able to handle the amount of requests coming in. This is a very busy time of year for our site and it would be getting more than normal hits on those pages. We did get some suggestions for tuning our coldfusion server from Adobe support, but making those changes did not help. We looked into how to tune the apache and came upon a suggestion to increase the max clients.
We had this in our httpd.conf:
StartServers 2
MaxClients 150
MinSpareThreads 25
MaxSpareThreads 75
ThreadsPerChild 25
MaxRequestsPerChild 0
So we changed it to this:
ServerLimit 16
StartServers 2
MaxClients 400
MinSpareThreads 25
MaxSpareThreads 75
ThreadsPerChild 25
MaxRequestsPerChild 0
Then restarted httpd service. This did not seem to help any, so the next day we changed it again, to this:
ServerLimit 40
StartServers 2
MaxClients 1000
MinSpareThreads 25
MaxSpareThreads 75
ThreadsPerChild 25
MaxRequestsPerChild 0
and again restarted the httpd service. This seemed to make the 503 errors slow down some, but the website was still crashing multiple times a day.
Finally we took the 2 pages that were getting the most traffic (that do the most database access) and moved them to another server, and in the script on this server just redirected to the other server. This resolved the issue for the time being; the website stopped crashing and we've had no more 503 errors. However, we'd like to be able to tune the apache to handle the load so we can keep those pages on the server where the website resides. Does anyone have any other httpd.conf settings to suggest? (BTW, we are planning to upgrade the server to RHEL 6.5, CF 10, and Postgres 9.2 but have to wait for Sept for funding.)
Many thanks,
Julie
Responses
Julie,
I think you really need to narrow down / profile the issue further before you can start tuning the Apache config (and other components).
I am assuming you are using the prefork MPM which ships with Apache default configuration (from memory).
Do you have performance information from the server at time of the issue?
ie. memory usage, CPU usage, swap usage, number of open connections etc.?
Do you record render times etc. for pages?
Are there concurrent user limits in other components in the stack?
Have you had an instance where the server has recovered from the issue itself? or do you always need to restart the service to recover?
Do you collect longer term stats with sysstat / sar for the server? if so, do they show memory usage creeping up over time?
From the configuration you have posted, I would look at tuning 'MaxRequestsPerChild', this helps protect you against memory leaks in components by restarting child processes regularly (after serving a number of requests). This will help if your issue is caused by a leaky app.. but without knowing where the bottle neck is, this is just a general suggestion.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
