Instances doesn't get evacuated form a Crashed/Failed compute node. "No route to host" and "Unable to obtian correct plug status" can be seen in the logs
Issue
- Configured the Instance HA following guide and crashed compute node using
echo c > /proc/sysrq-trigger
command to test instance failover but the instances are not evacuated from a failed host. - From the controllers and computes, the IPMI network can be reachable using
fence_ipmilan
and shows the correct status as below.
[heat-admin@ctrl-01 ~]$fence_ipmilan -a xx.xx.x.20 -P -l Administrator -p IPMILANPW; -o status -vv
Executing: /usr/bin/ipmitool -I lanplus -H xx.xx.x.20 -U Administrator -P [set] -p 623 -L ADMINISTRATOR chassis power status
0 Chassis Power is off
*
[heat-admin@ctrl-01 ~]$ fence_ipmilan -a xx.xx.x.15 -P -l Administrator -p IPMILANPW; -o status
Status: ON
[heat-admin@ctrl-01 ~]$
[heat-admin@ctrl-01 ~]$ fence_ipmilan -a xx.xx.x.16 -P -l Administrator -p IPMILANPW; -o status
Status: ON
- While trying to evacuate the instances following messages are reported in corosync log on controller nodes.
Dec 28 17:27:01 [44777] ctrl-01.localdomain stonith-ng: notice: remote_op_done: Operation reboot of overcloud-compute-06 by <no-one> for crmd.21672@ctrl-02.582858d2: No route to host
Dec 28 17:27:01 [44781] ctrl-01.localdomain crmd: notice: tengine_stonith_notify: Peer overcloud-compute-06 was not terminated (reboot) by <anyone> for ctrl-02: No route to host (ref=582858d2-cdb1-472f-bfac-ef5f4302188e) by client crmd.21672
Dec 28 17:29:02 [44777] ctrl-01.localdomain stonith-ng: notice: remote_op_done: Operation reboot of overcloud-compute-06 by <no-one> for crmd.21672@ctrl-02.59868647: No route to host
Environment
- Red Hat OpenStack Platform 9.0
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.