Instances doesn't get evacuated form a Crashed/Failed compute node. "No route to host" and "Unable to obtian correct plug status" can be seen in the logs
Issue
- Configured the Instance HA following guide and crashed compute node using
echo c > /proc/sysrq-triggercommand to test instance failover but the instances are not evacuated from a failed host. - From the controllers and computes, the IPMI network can be reachable using
fence_ipmilanand shows the correct status as below.
[heat-admin@ctrl-01 ~]$fence_ipmilan -a xx.xx.x.20 -P -l Administrator -p IPMILANPW; -o status -vv
Executing: /usr/bin/ipmitool -I lanplus -H xx.xx.x.20 -U Administrator -P [set] -p 623 -L ADMINISTRATOR chassis power status
0 Chassis Power is off
*
[heat-admin@ctrl-01 ~]$ fence_ipmilan -a xx.xx.x.15 -P -l Administrator -p IPMILANPW; -o status
Status: ON
[heat-admin@ctrl-01 ~]$
[heat-admin@ctrl-01 ~]$ fence_ipmilan -a xx.xx.x.16 -P -l Administrator -p IPMILANPW; -o status
Status: ON
- While trying to evacuate the instances following messages are reported in corosync log on controller nodes.
Dec 28 17:27:01 [44777] ctrl-01.localdomain stonith-ng: notice: remote_op_done: Operation reboot of overcloud-compute-06 by <no-one> for crmd.21672@ctrl-02.582858d2: No route to host
Dec 28 17:27:01 [44781] ctrl-01.localdomain crmd: notice: tengine_stonith_notify: Peer overcloud-compute-06 was not terminated (reboot) by <anyone> for ctrl-02: No route to host (ref=582858d2-cdb1-472f-bfac-ef5f4302188e) by client crmd.21672
Dec 28 17:29:02 [44777] ctrl-01.localdomain stonith-ng: notice: remote_op_done: Operation reboot of overcloud-compute-06 by <no-one> for crmd.21672@ctrl-02.59868647: No route to host
Environment
- Red Hat OpenStack Platform 9.0
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
