How do we tell if your server crashed?

Latest response

I'm trying to look for a key word or phrase in any /var/log/* files which my Splunk event manager could pick up on to tell me if a server crashed over night and rebooted.

I don't want to just check for reboots, we do those WAY too much these days with all the patching going on.

What do I search on to find this event? Is there anything to configure in kdump that'll help?

Responses

Couple possibilities, all will have the kernel: syslog tag:
* Look for kernel: Linux version
* Look for kernel: Command line:
* Possibly kernel: imklog .* started.

I'm getting a lot of those possibilities too. I'm hoping to find a good correlation to the dump event.

Thanks for the ideas!

I was hoping for a neon sign that said, "Hey, I crashed, see crash dump file."

:-)

As I'm specifically crashing due to STIG implementations - aka "audit events exceeded", and that we have to set the failure mode to panic, "-f 2" in audit.rules the system stops all logging for the few seconds while it's panic'd and/or rebooting. It's difficult to see the event occurring when no logging is occurring on the system - yes?

As I'm new to RHEL systems crashing in our environment, I can only assume if a hardware error or application caused a system panic we might be able to easily see that the system went down for a panic somewhere in the log files. But not in this instance. Is this true?

My solution is quite easy, but I'm sure there's a "cleaner way" to do this.

I'm having rc.local run a tiny script upon boot up to see if there is anything in /var/crash/. If so, then send a custom message to logger, hence our enterprise event logger can capture that crash message easy enough. Once we analyze the dump file(s), we can, and should, clean up /var/crash/.

We'll have our configuration manager tool push out the new script and modify the rc.local file for all systems.

Thoughts? What's a better way? I know this cat can be skinned in four different ways at least - nothing against cats mind you. :-)

Thanks,

Unfortunately there is no single perfect solution/way to get server crashed event. If system is unexpectedly crashed then /var/log will not have any clue about same ......

You can configure kdump and enable sysctl parameter which will help capturing vmcore .....

We created an rc script that used its own "signal" file in /var/run. In the stop action during shutdown, this file was removed. In the start action during boot, an alert was raised when this file was present otherwise the file was created to check with the next boot.

I can already run a report from my desk to ssh into each server and tell me if there is anything in the crash directory. This works quick with public/private keys. So I can tell at a glance if something crashed.

Thanks,

That doesn't so much inform you of every crash, however. What it tells you is:
* The server has been configured to collect core files (many organizations explicitly disable this for various reasons)
* A server that was configured to collect crash-cores was actually able to recover a core-file post-crash ...which isn't a 100% occurrence.

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.