How to configure instance HA using tags
Environment
- Red Hat Openstack 10 (Newton)
Issue
- Instance HA without configuring tags in either flavors or images are working fine.
- Instance HA with only instances deployed with a flavor with tag
evacuable=true
are working fine. - Instance HA with instances deployed with both flavors with tag
evacuable=true
and another flavor without tag are not working correctly. - Instance HA with instances deployed with both flavors with tag
evacuable=true
and another flavor with tagevacuable=false
are not working correctly.
Resolution
Red Hat Enterprise Linux 7
- The issue (BZ #1600600) has been resolved with errata RHBA-2018:3031 with the following package(s):
fence-agents-4.2.1-11.el7
or later.
Workaround
Follow our official documentation to configure instance ha, where it is possible to download the ansible scripts and configure the instance ha.
When there are tags defined all instances with the tag evacuable=true
will be evacuated (if there are enough resources available) in case a compute failure.
In this case instances with no tag or evacuable=false tag
will be not evacuated.
To deploy instance-ha playbooks are placed in /home/stack/ansible-instanceha
. In that directory, there is an install.sh script which deploys instance-ha.
A problem was detected during installation, some stonith devices were not created and installation fails.
The following errors makes installation to fail:
failed: [undercloud -> controller-2] (item=compute1) => {"changed": true, "cmd": "pcs stonith show ipmilan-compute1", "delta": "0:00:00.263657", "end": "2018-07-06 10:40:29.698450", "failed": true, "item": "compute1", "msg": "non-zero return code", "rc": 1, "start": "2018-07-06 10:40:29.434793", "stderr": "Error: unable to find resource 'ipmilan-compute1'", "stderr_lines": ["Error: unable to find resource 'ipmilan-compute1'"], "stdout": "", "stdout_lines": []}
To solve that after failure the device was created manually:
[root@controller0 ~]# pcs stonith create ipmilan-compute1 fence_ipmilan pcmk_host_list="compute1" ipaddr=192.168.1.6 action="reboot" login="admin" passwd="password" delay=20 op monitor interval=60s]
To avoid a new failure the nova-evacuate cluster resource must be deleted:
[root@controller0 ~]# pcs resource delete nova-evacuate
If this resource is not deleted the following error will arise:
TASK [instance-ha : Create resource nova-evacuate (no_shared_storage)] ********************************************************************************************************************************************
fatal: [undercloud -> controller-2]: FAILED! => {"changed": true, "cmd": "pcs resource create nova-evacuate ocf:openstack:NovaEvacuate auth_url=$OS_AUTH_URL username=$OS_USERNAME password=$OS_PASSWORD tenant_name=$OS_TENANT_NAME no_shared_storage=1", "delta": "0:00:00.274869", "end": "2018-07-06 10:53:47.636635", "failed": true, "msg": "non-zero return code", "rc": 1, "start": "2018-07-06 10:53:47.361766", "stderr": "Error: 'nova-evacuate' already exists", "stderr_lines": ["Error: 'nova-evacuate' already exists"], "stdout": "", "stdout_lines": []}
to retry, use: --limit @/home/stack/ansible-instanceha/playbooks/overcloud-instance-ha.retry
After deleting the resource, it is necessary to run the installer ha installation again.
After the installation controllers could be shown as computes so the following constraints do not have to be present:
Resource: nova-compute-checkevacuate-clone
Constraint: location-nova-compute-checkevacuate-clone (resource-discovery=exclusive)
Rule: score=0
Expression: osprole eq compute
Constraint: location-nova-compute-checkevacuate-clone-1 (resource-discovery=exclusive)
Rule: score=0
Expression: osprole eq controller
Resource: nova-compute-clone
Constraint: location-nova-compute-clone (resource-discovery=exclusive)
Rule: score=0
Expression: osprole eq compute
Constraint: location-nova-compute-clone-1 (resource-discovery=exclusive)
Rule: score=0
Expression: osprole eq controller
So if they are present, it must to delete them:
[root@controller0 ~]# pcs constraint remove location-nova-compute-checkevacuate-clone-1
[root@controller0 ~]# pcs constraint remove location-nova-compute-clone-1
It seems that those constraints are not added by ansible playbook that it is being tracked by Red Hat.
Stonith has to be enabled:
[root@controller0 ~]# pcs property set stonith-enabled=true
Due to issue on the fence_evacuate
, it is necessary to apply errata:
- The issue (bz1600602) has been resolved with errata RHBA-2018:2459 with the following package(s):
fence-agents-4.0.11-86.el7_5.3
,fence-agents-all-4.0.11-86.el7_5.3
,fence-agents-common-4.0.11-86.el7_5.3
or later. - The issue (bz1600600) has been resolved with errata RHBA-2018:2416 with the following package(s):
fence-agents-4.0.11-66.el7_4.8
,fence-agents-all-4.0.11-66.el7_4.8
,fence-agents-common-4.0.11-66.el7_4.8
or later for RHEL 7.4.z releases.
[stack@undercloud ~]$ cd ansible-instanceha
[stack@undercloud ansible-instance-ha]$ ansible-playbook -i hosts copy_patched_fence_evacuate_to_controllers.yaml
Diagnostic Steps
To simulate a compute crash the following command was used:
[root@compute ~]# echo c > /proc/sysrq-trigger
After crash instance ha reboots the compute and when boot the following cluster resources are stopped on the compute:
# nova-compute-checkevacuate-clone
# nova-compute-clone
This is the expected behaviour. After checking that compute is in good shape to run workloads:
[root@controller ~]# pcs resource cleanup nova-compute-check-evacuate-clone
[root@controller ~]# pcs resource cleanup nova-compute-clone
Testing steps verification
- Environment: RHOSP10, RHEL 7.5, fence-agents-4.2.1-2.el7, IHA setup using tripleo-ha-utils repo.
If RHOSP10 is using RHEL 7.4, make sure that the selinux RPMs are updated in the overcloud.
- Tests results: All instance with flavor property evacuable=true were evacuated, instances with no property were not evacuated and left in SHUTOFF state, instances with property evacuable=false were not evacuated.
- Procedure: Created 8 instances 4 with evacuable=true flavor 4 without, hard rebooted one compute. Created 8 instances 4 with evacuable=true flavor 4 with evacuate=false, hard rebooted one compute.
- Details:
pcs status:
controller-0 | SUCCESS | rc=0 >>
Cluster name: tripleo_cluster
Stack: corosync
Current DC: controller-0 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Thu Jul 12 21:50:45 2018
Last change: Thu Jul 12 13:31:02 2018 by hacluster via crmd on controller-1
5 nodes configured
46 resources configured
Online: [ controller-0 controller-1 controller-2 ]
RemoteOnline: [ compute-0 compute-1 ]
Full list of resources:
ip-192.168.24.10 (ocf::heartbeat:IPaddr2): Started controller-2
ip-172.17.4.11 (ocf::heartbeat:IPaddr2): Started controller-0
Clone Set: haproxy-clone [haproxy]
Started: [ controller-0 controller-1 controller-2 ]
Stopped: [ compute-0 compute-1 ]
Master/Slave Set: galera-master [galera]
Masters: [ controller-0 controller-1 controller-2 ]
Stopped: [ compute-0 compute-1 ]
ip-172.17.1.19 (ocf::heartbeat:IPaddr2): Started controller-1
ip-10.0.0.107 (ocf::heartbeat:IPaddr2): Started controller-2
ip-172.17.3.15 (ocf::heartbeat:IPaddr2): Started controller-0
Clone Set: rabbitmq-clone [rabbitmq]
Started: [ controller-0 controller-1 controller-2 ]
Stopped: [ compute-0 compute-1 ]
Master/Slave Set: redis-master [redis]
Masters: [ controller-1 ]
Slaves: [ controller-0 controller-2 ]
Stopped: [ compute-0 compute-1 ]
ip-172.17.1.14 (ocf::heartbeat:IPaddr2): Started controller-1
openstack-cinder-volume (systemd:openstack-cinder-volume): Started controller-2
ipmilan-controller-2 (stonith:fence_ipmilan): Stopped
ipmilan-controller-1 (stonith:fence_ipmilan): Stopped
ipmilan-controller-0 (stonith:fence_ipmilan): Stopped
ipmilan-compute-0 (stonith:fence_ipmilan): Stopped
ipmilan-compute-1 (stonith:fence_ipmilan): Stopped
nova-evacuate (ocf::openstack:NovaEvacuate): Started controller-0
Clone Set: nova-compute-checkevacuate-clone [nova-compute-checkevacuate]
Started: [ compute-0 compute-1 ]
Stopped: [ controller-0 controller-1 controller-2 ]
Clone Set: nova-compute-clone [nova-compute]
Started: [ compute-0 compute-1 ]
Stopped: [ controller-0 controller-1 controller-2 ]
fence-nova (stonith:fence_compute): Started controller-1
compute-1 (ocf::pacemaker:remote): Started controller-0
compute-0 (ocf::pacemaker:remote): Started controller-1
[stack@undercloud-0 ~]$ openstack flavor show m1.tiny
+----------------------------+---------+
| Field | Value |
+----------------------------+---------+
| OS-FLV-DISABLED:disabled | False |
| OS-FLV-EXT-DATA:ephemeral | 0 |
| access_project_ids | None |
| disk | 1 |
| id | 0 |
| name | m1.tiny |
| os-flavor-access:is_public | True |
| properties | |
| ram | 64 |
| rxtx_factor | 1.0 |
| swap | |
| vcpus | 1 |
+----------------------------+---------+
[stack@undercloud-0 ~]$ openstack flavor create --id 1 --vcpus 1 --ram 64 --disk 1 m1.tiny-evac
+----------------------------+--------------+
| Field | Value |
+----------------------------+--------------+
| OS-FLV-DISABLED:disabled | False |
| OS-FLV-EXT-DATA:ephemeral | 0 |
| disk | 1 |
| id | 1 |
| name | m1.tiny-evac |
| os-flavor-access:is_public | True |
| properties | |
| ram | 64 |
| rxtx_factor | 1.0 |
| swap | |
| vcpus | 1 |
+----------------------------+--------------+
[stack@undercloud-0 ~]$ openstack flavor set m1.tiny-evac --property evacuable=true
[stack@undercloud-0 ~]$ openstack flavor show m1.tiny-evac
+----------------------------+------------------+
| Field | Value |
+----------------------------+------------------+
| OS-FLV-DISABLED:disabled | False |
| OS-FLV-EXT-DATA:ephemeral | 0 |
| access_project_ids | None |
| disk | 1 |
| id | 1 |
| name | m1.tiny-evac |
| os-flavor-access:is_public | True |
| properties | evacuable='true' |
| ram | 64 |
| rxtx_factor | 1.0 |
| swap | |
| vcpus | 1 |
+----------------------------+------------------+
[stack@undercloud-0 ~]$ date;for i in `openstack server list -cID -fvalue`;do openstack server show $i |grep -w 'name \|flavor\|id\|OS-EXT-SRV-ATTR:host\|status';echo'';done
Thu Jul 12 17:48:48 EDT 2018
| OS-EXT-SRV-ATTR:host | compute-0.localdomain |
| flavor | m1.tiny-evac (1) |
| id | 907657b9-9e02-4972-a642-8c1148b72469 |
| name | osvm-evac-4 |
| status | ACTIVE |
| OS-EXT-SRV-ATTR:host | compute-1.localdomain |
| flavor | m1.tiny-evac (1) |
| id | 45c627f2-bcf9-4826-aa35-9cbe7d328327 |
| name | osvm-evac-3 |
| status | ACTIVE |
| OS-EXT-SRV-ATTR:host | compute-0.localdomain |
| flavor | m1.tiny-evac (1) |
| id | e4238c23-9fd9-4867-a72d-a34e328430b0 |
| name | osvm-evac-2 |
| status | ACTIVE |
| OS-EXT-SRV-ATTR:host | compute-1.localdomain |
| flavor | m1.tiny-evac (1) |
| id | 728cd54f-0fd1-4c7f-beff-d8abae178a7d |
| name | osvm-evac-1 |
| status | ACTIVE |
| OS-EXT-SRV-ATTR:host | compute-0.localdomain |
| flavor | m1.tiny (0) |
| id | 03e7bd0e-23b6-441c-88e7-36cbff593280 |
| name | osvm-4 |
| status | ACTIVE |
| OS-EXT-SRV-ATTR:host | compute-1.localdomain |
| flavor | m1.tiny (0) |
| id | 33d9653e-3e8c-4e38-a5e2-d58d51fdcbeb |
| name | osvm-3 |
| status | ACTIVE |
| OS-EXT-SRV-ATTR:host | compute-0.localdomain |
| flavor | m1.tiny (0) |
| id | ce67f123-1803-4766-ad62-98eada6f1e01 |
| name | osvm-2 |
| status | ACTIVE |
| OS-EXT-SRV-ATTR:host | compute-1.localdomain |
| flavor | m1.tiny (0) |
| id | 7ce7e96f-5ba1-484c-9b5d-15b11ab9e65b |
| name | osvm-1 |
| status | ACTIVE
kill compute-0 using echo b > /proc/sysrq-trigger
...
[stack@undercloud-0 ~]$ date;for i in `openstack server list -cID -fvalue`;do openstack server show $i |grep -w 'name \|flavor\|id\|OS-EXT-SRV-ATTR:host\|status';echo'';done
Thu Jul 12 18:03:02 EDT 2018
| OS-EXT-SRV-ATTR:host | compute-1.localdomain |
| flavor | m1.tiny-evac (1) |
| id | 907657b9-9e02-4972-a642-8c1148b72469 |
| name | osvm-evac-4 |
| status | ACTIVE |
| OS-EXT-SRV-ATTR:host | compute-1.localdomain |
| flavor | m1.tiny-evac (1) |
| id | 45c627f2-bcf9-4826-aa35-9cbe7d328327 |
| name | osvm-evac-3 |
| status | ACTIVE |
| OS-EXT-SRV-ATTR:host | compute-1.localdomain |
| flavor | m1.tiny-evac (1) |
| id | e4238c23-9fd9-4867-a72d-a34e328430b0 |
| name | osvm-evac-2 |
| status | ACTIVE |
| OS-EXT-SRV-ATTR:host | compute-1.localdomain |
| flavor | m1.tiny-evac (1) |
| id | 728cd54f-0fd1-4c7f-beff-d8abae178a7d |
| name | osvm-evac-1 |
| status | ACTIVE |
| OS-EXT-SRV-ATTR:host | compute-0.localdomain |
| flavor | m1.tiny (0) |
| id | 03e7bd0e-23b6-441c-88e7-36cbff593280 |
| name | osvm-4 |
| status | SHUTOFF |
| OS-EXT-SRV-ATTR:host | compute-1.localdomain |
| flavor | m1.tiny (0) |
| id | 33d9653e-3e8c-4e38-a5e2-d58d51fdcbeb |
| name | osvm-3 |
| status | ACTIVE |
| OS-EXT-SRV-ATTR:host | compute-0.localdomain |
| flavor | m1.tiny (0) |
| id | ce67f123-1803-4766-ad62-98eada6f1e01 |
| name | osvm-2 |
| status | SHUTOFF |
| OS-EXT-SRV-ATTR:host | compute-1.localdomain |
| flavor | m1.tiny (0) |
| id | 7ce7e96f-5ba1-484c-9b5d-15b11ab9e65b |
| name | osvm-1 |
| status | ACTIVE |
[stack@undercloud-0 ~]$ openstack flavor set m1.tiny --property evacuable=false
[stack@undercloud-0 ~]$ date;for i in `openstack server list -cID -fvalue`;do openstack server show $i |grep -w 'name \|flavor\|id\|OS-EXT-SRV-ATTR:host\|status';echo'';done
Thu Jul 12 18:25:24 EDT 2018
| OS-EXT-SRV-ATTR:host | compute-1.localdomain |
| flavor | m1.tiny-evac (1) |
| id | 907657b9-9e02-4972-a642-8c1148b72469 |
| name | osvm-evac-4 |
| status | ACTIVE |
| OS-EXT-SRV-ATTR:host | compute-1.localdomain |
| flavor | m1.tiny-evac (1) |
| id | 45c627f2-bcf9-4826-aa35-9cbe7d328327 |
| name | osvm-evac-3 |
| status | ACTIVE |
| OS-EXT-SRV-ATTR:host | compute-1.localdomain |
| flavor | m1.tiny-evac (1) |
| id | e4238c23-9fd9-4867-a72d-a34e328430b0 |
| name | osvm-evac-2 |
| status | ACTIVE |
| OS-EXT-SRV-ATTR:host | compute-1.localdomain |
| flavor | m1.tiny-evac (1) |
| id | 728cd54f-0fd1-4c7f-beff-d8abae178a7d |
| name | osvm-evac-1 |
| status | ACTIVE |
| OS-EXT-SRV-ATTR:host | compute-0.localdomain |
| flavor | m1.tiny (0) |
| id | 03e7bd0e-23b6-441c-88e7-36cbff593280 |
| name | osvm-4 |
| status | SHUTOFF |
| OS-EXT-SRV-ATTR:host | compute-1.localdomain |
| flavor | m1.tiny (0) |
| id | 33d9653e-3e8c-4e38-a5e2-d58d51fdcbeb |
| name | osvm-3 |
| status | ACTIVE |
| OS-EXT-SRV-ATTR:host | compute-0.localdomain |
| flavor | m1.tiny (0) |
| id | ce67f123-1803-4766-ad62-98eada6f1e01 |
| name | osvm-2 |
| status | SHUTOFF |
| OS-EXT-SRV-ATTR:host | compute-1.localdomain |
| flavor | m1.tiny (0) |
| id | 7ce7e96f-5ba1-484c-9b5d-15b11ab9e65b |
| name | osvm-1 |
| status | ACTIVE |
kill compute-1 via echo b>/proc/sysrq-trigger
...
[stack@undercloud-0 ~]$ date;for i in `openstack server list -cID -fvalue`;do openstack server show $i |grep -w 'name \|flavor\|id\|OS-EXT-SRV-ATTR:host\|status';echo'';done
Thu Jul 12 18:30:06 EDT 2018
| OS-EXT-SRV-ATTR:host | compute-0.localdomain |
| flavor | m1.tiny-evac (1) |
| id | 907657b9-9e02-4972-a642-8c1148b72469 |
| name | osvm-evac-4 |
| status | ACTIVE |
| OS-EXT-SRV-ATTR:host | compute-0.localdomain |
| flavor | m1.tiny-evac (1) |
| id | 45c627f2-bcf9-4826-aa35-9cbe7d328327 |
| name | osvm-evac-3 |
| status | ACTIVE |
| OS-EXT-SRV-ATTR:host | compute-0.localdomain |
| flavor | m1.tiny-evac (1) |
| id | e4238c23-9fd9-4867-a72d-a34e328430b0 |
| name | osvm-evac-2 |
| status | ACTIVE |
| OS-EXT-SRV-ATTR:host | compute-0.localdomain |
| flavor | m1.tiny-evac (1) |
| id | 728cd54f-0fd1-4c7f-beff-d8abae178a7d |
| name | osvm-evac-1 |
| status | ACTIVE |
| OS-EXT-SRV-ATTR:host | compute-0.localdomain |
| flavor | m1.tiny (0) |
| id | 03e7bd0e-23b6-441c-88e7-36cbff593280 |
| name | osvm-4 |
| status | SHUTOFF |
| OS-EXT-SRV-ATTR:host | compute-1.localdomain |
| flavor | m1.tiny (0) |
| id | 33d9653e-3e8c-4e38-a5e2-d58d51fdcbeb |
| name | osvm-3 |
| status | ACTIVE |
| OS-EXT-SRV-ATTR:host | compute-0.localdomain |
| flavor | m1.tiny (0) |
| id | ce67f123-1803-4766-ad62-98eada6f1e01 |
| name | osvm-2 |
| status | SHUTOFF |
| OS-EXT-SRV-ATTR:host | compute-1.localdomain |
| flavor | m1.tiny (0) |
| id | 7ce7e96f-5ba1-484c-9b5d-15b11ab9e65b |
| name | osvm-1 |
| status | ACTIVE
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments