Controller host vbmc (virtualbmc) went down and generate cluster instablity

Solution In Progress - Updated -

Issue

  • Sometimes virtualbmc instances are going down:
[root@hostname01 contrail]# vbmc list
+-----------------------------+---------+---------+------+
|         Domain name         |  Status | Address | Port |
+-----------------------------+---------+---------+------+
|   domain01                  | running |    ::   | 643  |
|   domain02                  | running |    ::   | 641  |
|   overcloud-controller-0    | running |    ::   | 650  |
|   overcloud-controller-1    | running |    ::   | 642  |
|   overcloud-controller-2    | Down    |    ::   | 630  |
+-----------------------------+---------+---------+------+
  • When the virtualbmc instance goes down, pacemaker notices that state and can no longer use those fencing devices:
Failed Actions: 
* stonith-fence_ipmilan-525400d4fbbb_start_0 on overcloud-controller-1 'unknown error' (1): call=147, status=Timed Out, exitreason='', last-rc-change='Fri Dec 14 14:48:56 2018', queued=0ms, exec=20042ms 
* stonith-fence_ipmilan-525400d4fbbb_start_0 on overcloud-controller-2 'unknown error' (1): call=187, status=Timed Out, exitreason='', last-rc-change='Mon Dec 24 12:14:58 2018', queued=0ms, exec=20006ms 
* galera_monitor_10000 on galera-bundle-1 'unknown error' (1): call=15972, status=complete, exitreason='local node <overcloud-controller-1> is started, but not in primary mode. Unknown state.', last-rc-change='Mon Dec 24 11:45:12 2018', queued=0ms, exec=0ms

Environment

  • Red Hat OpenStack Platform 13.0 (RHOSP)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content