System reboot itself

Latest response

I have a pysichal server with RH6, this server is part of a cluster,
The issue is that this server restarts itself, when validating the logs, there is no message that indicates the problem, any idea why it does this?

Responses

Hi Infra Alsea,

A few questions....

  • What specific version of Red Hat Enterprise Linux (RHEL) are you using? - I see you mentioned it is RHEL 6, but is it perhaps 6.10, 6.9, 6.8, 6.7. or something lower?
  • Were any changes made recently? Example, did you happen to update the system or load additional software or make changes to some configuration file?
  • When did the system last work without this issue, and have you had an opportunity to discuss with any of the likely people who might have introduced change?
  • Have you had an opportunity to examine the history of root to see what files (configuration) may have been edited?
  • When the system reboots itself, examine /var/log/messages to see what is happening right around the time of the reboot... - let us know.
  • How often or how much time between reboots? Is it a reasonably consistent amount of time (like every minute, every 25 minutes or something immediate, or a non-consistent amount of time)?
  • Is the other system using something called "STONITH" (Shoot The Other Node In The Head, this is really a thing). Some unique servers have something like this.
  • Is there anything in the cluster that shows some segment of the clustered software or services failing?

Run this: (if your server is running in init 3, run level 3, if it is run level 5 usually, change the 3 below to 5)

egrep default /etc/inittab
chkconig --list | egrep 3:on

For each item see if there are any services that are not on. So the output above should show all services that ought to be on, and if there is a failed service, then that could be a place to look.

Become root in a terminal, navigate to /var/log using cd /var/log and execute ls -ltr for any of the logs that are being written to recently... and examine those. I suspect you've done some of this.

Examine memory usage - run a top command and examine what's happening - perhaps around the time the system is rebooting. We have no idea if the reboots are immediate or if there is some time in between reboots to examine things, or monitor things.

Please reread your post above, and imagine you are an outsider to the situation you present. Evaluate what information you have presented and if it is really sufficient to go from in order to diagnose the situation, and what info could be added to a person who is an outsider to this issue might need to help you resolve.

Lastly, for your own reference, examine your issue in light of these solutions which are broad in scope (we really do need more info for your specific situation)

Wish you well with diagnosing/resolving this issue. Please carefully consider this and other things that might help us help you

Kind Regards

RJ

Oh, what form of clustered service does this server perform? What clustered software is running? See last post,

Thanks

RJ

Hi "Infra ALSEA",

RJ is so right, you've provided absolutely no information about anything. Thankfully, RJ took some of his spare time to give you
a lot of useful hints and he did this right by guessing about your specific circumstances ... I agree : please help us to help you ! :)

Regards,
Christian

Hi.

We have Red Hat Enterprise Linux Server version 6.5 (Santiago), recently we have not made any changes, this situation takes so long, we are inquiring if it is a physical issue, since by validating the logs; As mentioned, only the reset instruction is seen, but the cause of the crash is never seen.

The result of commands its this: [user@srfbd001 ~]$ egrep default /etc/inittab

inittab is only used by upstart for the default runlevel. 0 - halt (Do NOT set initdefault to this) 6 - reboot (Do NOT set initdefault to this)

id:3:initdefault: [user@srfbd001 ~]$

This server have a database Oracle 12.

The cluster has 3 nodes, identical between them. and this situation it's not present in the anothers nodes.

For the service stonith its not present in the servers.

Do you need another information?

Regards.

Hi ! :)

Before we're starting to dig deeper here - please upgrade to the latest stable minor release of RHEL 6, which
currently is 6.10 - you are using a completely outdated system (one main reason why you are facing issues) -
best you can do would be to install the latest stable edition of RHEL 7, which currently is RHEL 7.6 ! Also it's
good practice to generally keep operating systems updated - and, you are 5 (five !!!) point releases behind ...

Regards,
Christian

Hello Infra ALSEA

Along with what Christian said, examine the 3 solution articles I provided in light of your current situation

Also, if you do upgrade your server running an Oracle Database, make sure to sanely turn off the oracle database during the operating system upgrade. Make sure you have sufficient space in /boot to take another kernel.

Take a sane approach to upgrading since you have an Oracle database. We do not know your situation or the experience base for your Oracle administrators. Take your time, assess it, have a backup of the Oracle database and be confident you can restore. When I worked with a project with RHEL 6 - generally the upgrades went just fine. That being said, it's good to take precautions.

Regards,

RJ

Infra ALSEA

I see you added more info, but re-read my post (and Christian's) and evaluate the things you didn't reply to from my post.

Regards

RJK

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.