All the Virtual Machines in a RHHI Setup Got Paused or Rebooted

Solution In Progress - Updated -

Issue

All the virtual machines running over Red Hat Hyperconverged Infrastructure got paused or rebooted. The DB audit logs show error messages stating that the hosts are fenced due to a hearbeat failure. Example:

        timezone         |      correlation_id      |                        message                                                                            
-------------------------+--------------------------+------------------------------------------------------------------------------------------------------------------------
 2018-07-05 02:38:49.443 |                          | Host host1.example.net  has network interface which exceeded the defined threshold [95%] (bond0: transmit rate[100%], receive rate [100%])

 2018-07-05 07:56:05.339 |                          | VDSM host1.example.net command GlusterServersListVDS failed: Heartbeat exceeded

 2018-07-05 07:56:05.349 |                          | Host host1.example.net  is not responding. It will stay in Connecting state for a grace period of 67 seconds and after that an attempt to fence the host will be issued.

 2018-07-05 07:57:36.02  |                          | Executing power management status on Host host1.example.net using Proxy Host host2.example.net  and Fence Agent ipmilan:10.0.0.1

 2018-07-05 07:58:52.882 | 6592b36c                 | Gluster command [gluster peer status] failed on server 10.0.0.1

Environment

Red Hat Hyperconverged Infrastructure 1.x

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In
Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.