Controller host vbmc (virtualbmc) went down and generate cluster instablity
Issue
- Sometimes virtualbmc instances are going down:
[root@hostname01 contrail]# vbmc list
+-----------------------------+---------+---------+------+
| Domain name | Status | Address | Port |
+-----------------------------+---------+---------+------+
| domain01 | running | :: | 643 |
| domain02 | running | :: | 641 |
| overcloud-controller-0 | running | :: | 650 |
| overcloud-controller-1 | running | :: | 642 |
| overcloud-controller-2 | Down | :: | 630 |
+-----------------------------+---------+---------+------+
- When the virtualbmc instance goes down, pacemaker notices that state and can no longer use those fencing devices:
Failed Actions:
* stonith-fence_ipmilan-525400d4fbbb_start_0 on overcloud-controller-1 'unknown error' (1): call=147, status=Timed Out, exitreason='', last-rc-change='Fri Dec 14 14:48:56 2018', queued=0ms, exec=20042ms
* stonith-fence_ipmilan-525400d4fbbb_start_0 on overcloud-controller-2 'unknown error' (1): call=187, status=Timed Out, exitreason='', last-rc-change='Mon Dec 24 12:14:58 2018', queued=0ms, exec=20006ms
* galera_monitor_10000 on galera-bundle-1 'unknown error' (1): call=15972, status=complete, exitreason='local node <overcloud-controller-1> is started, but not in primary mode. Unknown state.', last-rc-change='Mon Dec 24 11:45:12 2018', queued=0ms, exec=0ms
Environment
- Red Hat OpenStack Platform 13.0 (RHOSP)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.