HTTP 503 error repeatedly causing website to crash on RHEL 5.10

Latest response

Hello,
We have a Coldfusion 9 website sitting on a server that is running RHEL 5.10 and Apache 2.2.3. The website accesses a Postgresql 8.4 database from just a few of its pages; the majority of the website pages are static. We've never had any trouble before last Tuesday, when the website suddenly started crashing and giving 503 errors. We had to restart the coldfusion and apache services and then it would come back up, but would crash again in 1-3 hours. We at first thought it was a coldfusion issue, but there were no errors in any of the CF logs, but a lot of 503 errors in the httpd access_log. We noticed that the majority of the 503 errors were on the 2 pages of the website the do the most database access. Some discussion with the server host company suggested our apache was not able to handle the amount of requests coming in. This is a very busy time of year for our site and it would be getting more than normal hits on those pages. We did get some suggestions for tuning our coldfusion server from Adobe support, but making those changes did not help. We looked into how to tune the apache and came upon a suggestion to increase the max clients.

We had this in our httpd.conf:

StartServers 2
MaxClients 150
MinSpareThreads 25
MaxSpareThreads 75
ThreadsPerChild 25
MaxRequestsPerChild 0

So we changed it to this:


ServerLimit 16
StartServers 2
MaxClients 400
MinSpareThreads 25
MaxSpareThreads 75
ThreadsPerChild 25
MaxRequestsPerChild 0

Then restarted httpd service. This did not seem to help any, so the next day we changed it again, to this:


ServerLimit 40
StartServers 2
MaxClients 1000
MinSpareThreads 25
MaxSpareThreads 75
ThreadsPerChild 25
MaxRequestsPerChild 0

and again restarted the httpd service. This seemed to make the 503 errors slow down some, but the website was still crashing multiple times a day.

Finally we took the 2 pages that were getting the most traffic (that do the most database access) and moved them to another server, and in the script on this server just redirected to the other server. This resolved the issue for the time being; the website stopped crashing and we've had no more 503 errors. However, we'd like to be able to tune the apache to handle the load so we can keep those pages on the server where the website resides. Does anyone have any other httpd.conf settings to suggest? (BTW, we are planning to upgrade the server to RHEL 6.5, CF 10, and Postgres 9.2 but have to wait for Sept for funding.)

Many thanks,
Julie

Responses

Julie,

I think you really need to narrow down / profile the issue further before you can start tuning the Apache config (and other components).

I am assuming you are using the prefork MPM which ships with Apache default configuration (from memory).

Do you have performance information from the server at time of the issue?
ie. memory usage, CPU usage, swap usage, number of open connections etc.?

Do you record render times etc. for pages?
Are there concurrent user limits in other components in the stack?
Have you had an instance where the server has recovered from the issue itself? or do you always need to restart the service to recover?

Do you collect longer term stats with sysstat / sar for the server? if so, do they show memory usage creeping up over time?

From the configuration you have posted, I would look at tuning 'MaxRequestsPerChild', this helps protect you against memory leaks in components by restarting child processes regularly (after serving a number of requests). This will help if your issue is caused by a leaky app.. but without knowing where the bottle neck is, this is just a general suggestion.

Hello and thank you for your reply. I am sorry that I cannot answer most of your questions right now, but I can address a few:

I believe that yes, we are using the default prefork MPM, as I see this in my httpd.conf file:

(comment)prefork MPM

StartServers 8
MinSpareServers 5
MaxSpareServers 20
ServerLimit 256
MaxClients 256
MaxRequestsPerChild 4000

Do you have performance information: if I do, I do not know where to find it, it would have to be something that comes with apache or redhat by default as we have not installed anything extra on this server -- I will research that.

Record render times for pages: not that I know of, unless again that is done by default by apache or redhat.

concurrent user limits in other components in the stack: sorry, I do not know what that means or how to check for that.

Yes, there are times when the 503 error occurs but the website does not crash and we do not have to restart the services. Usually after a restart, things would be fine for a while (1-3 hours perhaps), and then we would start seeing 503 errors. If there were just a few, the website would stay up in spite of the 503 errors, but after a while there would be so many the site would go down. Every time it went down, of course we had to restart services.

sysstat/sar: no, this is not installed on this server

MaxRequestsPerChild: thank you for the suggestion, we will research how to tune that for our situation.

Will try to get back to you as time allows with answers to more of your questions, if we can find the information. Thanks again.
Julie

Hello,
We did look into the MaxRequestsPerChild and ours is set to 4000 for the prefork MPM, which based on the description seems to be ok.

I ran an ApacheBench on the server and here is the output, hopefully it fills in some of the missing information to help narrow down the problem:

[root@online ~]# ab -k -c 350 -n 20000 online.ctcd.edu/
This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright 2006 The Apache Software Foundation, http://www.apache.org/

Benchmarking online.ctcd.edu (be patient)
Completed 2000 requests
Completed 4000 requests
Completed 6000 requests
Completed 8000 requests
Completed 10000 requests
Completed 12000 requests
Completed 14000 requests
Completed 16000 requests
Completed 18000 requests
Finished 20000 requests

Server Software: Apache
Server Hostname: online.ctcd.edu
Server Port: 80

Document Path: /
Document Length: 19737 bytes

Concurrency Level: 350
Time taken for tests: 6.278392 seconds
Complete requests: 20000
Failed requests: 0
Write errors: 0
Keep-Alive requests: 0
Total transferred: 397419604 bytes
HTML transferred: 394799211 bytes
Requests per second: 3185.53 [#/sec] (mean)
Time per request: 109.872 [ms] (mean)
Time per request: 0.314 [ms] (mean, across all concurrent requests)
Transfer rate: 61815.99 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 12 189.3 0 3002
Processing: 1 71 449.2 22 6255
Waiting: 1 71 449.1 21 6254
Total: 9 84 492.9 22 6277

Percentage of the requests served within a certain time (ms)
50% 22
66% 24
75% 27
80% 29
90% 53
95% 75
98% 585
99% 2063
100% 6277 (longest request)
[root@online ~]#

Thanks again,
Julie

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.