server hangsup

Latest response

how to trouble shoot the RHEL server which hangs up due to CPU spike ? How to identify what process causes the CPU spike on the server and where should I look around?

Responses

You did not specify RHEL edition (6 or 7), but generally dmesg shows any HW related events, journalctl may help on RHEL7, /var/log/messages on RHEL 7.

Its on RHEL 6. However steps for rhel7 would also be good to know.

This issue "high CPU usage" and tracing it's root cause would be interesting and approach may varies from person to person.

Yes, as "Zdenek" suggested, starting with messages file for initial trace would be a good start point.

Btw, how did you conclude that the server was hung because of high CPU usage?

Normally, high CPU usage means either processor is too busy in processing system related tasks or just waiting for IO from disk. So, need to detect whether it was because of IO wait or something else. Also, make a note of the changes which were done before server hung-up which could help you in analyzing the issue.

So, I would use the sar file generated on the day when system was hung which would help in understanding system activity before it went into non-responsive mode. To see CPU stats dated 28th April, I would run this command:

LANG=C /usr/bin/sar -u -f /var/log/sa/sa28

Here the columns of interest are "%iowait", "%user", "%system"..

This command would also help:

LANG=C /usr/bin/sar -P ALL -f /var/log/sa/sa28

Like-wise, if you wish to check on disk stats using sar data:

LANG=C /usr/bin/sar -dp -f /var/log/sa/sa28

= load average statistics:

LANG=C /usr/bin/sar -q -f /var/log/sa/sa28

= memory usage stats:

LANG=C /usr/bin/sar -r -f /var/log/sa/sa28

= swap stats:

LANG=C /usr/bin/sar -S -f /var/log/sa/sa28

There are so many options available with sar command, check out the man page for more information.

If the system is configured to capture vmcore files then check out the dump for further details.

https://access.redhat.com/articles/1406253 https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Deployment_Guide/s1-kdump-crash.html

The cpu spike caused alerts in vcenter.

This is a nice article which explains steps to find out IOwait causing processes : https://access.redhat.com/solutions/288803

It may be not relavent at this time for your cocern, but worth book marking..

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.