All the Virtual Machines in a RHHI Setup Got Paused or Rebooted

Solution Verified - Updated -

Issue

All the virtual machines running over Red Hat Hyperconverged Infrastructure got paused or rebooted. The DB audit logs show error messages stating that the hosts are fenced due to a hearbeat failure. Example:

        timezone         |      correlation_id      |                        message                                                                            
-------------------------+--------------------------+------------------------------------------------------------------------------------------------------------------------
 2018-07-05 02:38:49.443 |                          | Host host1.example.net  has network interface which exceeded the defined threshold [95%] (bond0: transmit rate[100%], receive rate [100%])

 2018-07-05 07:56:05.339 |                          | VDSM host1.example.net command GlusterServersListVDS failed: Heartbeat exceeded

 2018-07-05 07:56:05.349 |                          | Host host1.example.net  is not responding. It will stay in Connecting state for a grace period of 67 seconds and after that an attempt to fence the host will be issued.

 2018-07-05 07:57:36.02  |                          | Executing power management status on Host host1.example.net using Proxy Host host2.example.net  and Fence Agent ipmilan:10.0.0.1

 2018-07-05 07:58:52.882 | 6592b36c                 | Gluster command [gluster peer status] failed on server 10.0.0.1

Environment

Red Hat Hyperconverged Infrastructure 1.x versions below 1.8

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content