Hosted Engine Resets Every 10 Minutes

Latest response

Hello

We have a RHEV 4.4 system running on 2 Dell 740's. RHEV01 and RHEV02. Went to update RHEV01 with an update so migrated the Hosted Engine to RHEV02 and halted the domain VM's.

Update complete and turns out I did the wrong update and now RHEV01 is down and out. All the VM's were able to be started on RHEV02 so the system seems usable for now.

Other than the trouble above the Hosted Engine seems to reset every 10 minutes due to Power Management. We lose connectivity for a minute or so then have to login again.

I have tried a few things like:

  1. putting the dead host into maintenance mode in the manager, it does show it in maintenance mode.

  2. unplugged the IDRAC network connection.

  3. Unchecked the Power Manger check box for the dead host.

None of that worked. Maybe I need to do #3 in the Host the Hosted Engine is running on?

I also see mention of Fencing. That might be the real issue as I do see it configured on Advanced Features. Shows Cluster followed by dc.

Also every 10 minutes this is logged:
"rhev manager event execution of power management status on host RHEV01 using proxy host RHEV02 and fence agent ipmilan .

Any ideas here. Thanks.

Responses

Welcome John O'Day

I'm not familiar with RHEV, but it's supported, you can open a case if this solution for an older version here https://access.redhat.com/solutions/314183 doesn't help...

Do you have some visibility to the system's power and hardware status? Example, if you are using Dell, they have iDRAC mangement and you can do some basic (I think) look at the status of the hardware. The mention of what you say reset every 10 minutes due to Power Management makes me wonder if you are having actual power issues... maybe? I'm not clear on that. Regardless, I'd rule out hardware if possible, examine other logs, and have a look at journalctl if possible.

Dell was an example, whatever vendor you used for hardware, maybe determine if there is a means to see if there are hardware issues. Maybe even look at the system itself to see if there's anything obvious.

I'm not familiar with RHEV, but this is an initial thought, I'm not sure if my above concerns are or are not warranted, but it may be good to rule out.

Regards,
RJ

Hi RJ. Thanks.

The issue is on the Hosted Engine not the physical Server itself. It was rebooting on its own due to idrac and the settings in the RHEV but I solved that issue. The server itself is corrupted so it right now there is of no use as it won't boot. The issue is in the RHEV settings maybe Fencing.

I am brand new to this so I am hesitant in changing too many settings so as the entire system does not function. Before doing a ticket I have to go through our programs help first. Thought I would try a community group also as I have found them helpful in the past.

John

Sorry to hear of your woes John,

I had seriously wondered if you were facing true hardware issues, interesting that it was iDRAC related and settings in RHEV too.

If you find the specific thing that resolved it, and can sanely post it here, it may help others.

Hope things go better for you with this
RJ