Etcd scaleup fails when first master/etcd node in ansible inventory fails
Environment
- Red Hat Enterprise Linux (RHEL) 7.6
- Red Hat OpenShift Container Platform (OCP) v3.9
- Red Hat Ansible Engine 2.4
Issue
- While trying to add an existing lost master/etcd node ( typically the first master in the ansible hosts file ) back into the cluster the
openshift-etcd/scaleup.ymlplaybook fails
"client: etcd cluster is unavailable or misconfigured; error #0: client: etcd member https://x.x.x.x:2379 has no leader\n; error #1: client: etcd member https://x.x.x.x:2379 has no leader", "stderr_lines": ["client: etcd cluster is unavailable or misconfigured; error #0: client: etcd member https://x.x.x.x:2379 has no leader", "; error #1: client: etcd member https://x.x.x.x:2379 has no leader"], "stdout": "", "stdout_lines": []}
...
FAILED! => {"failed": true, "msg": "last_checked_host: redhat.example.com, last_checked_var: ansible_python;'NoneType' object has no attribute '__getitem__'"}
Resolution
-
Bring back the first master/etcd node by starting the master services by running the
playbooks/openshift-master/scaleup.ymlplaybook reference -
Redeploy the etcd certs by running , first new-etcd-ca and then etcd-certificates
-
Run the etcd scale up playbook reference
Root Cause
- Adding an existing collocated master/etcd node ( first node defined in the ansible inventory ) using the scaleup playbook fails. The error seen is due to ansible facts failing:
2019-06-27 13:55:58,233 p=90215 u=root | TASK [Run variable sanity checks] **************************************************************************************
2019-06-27 13:55:58,233 p=90215 u=root | task path: /usr/share/ansible/openshift-ansible/playbooks/init/sanity_checks.yml:13
2019-06-27 13:55:58,411 p=90215 u=root | fatal: [redhat1.example.com]: FAILED! => {
"failed": true,
"msg": "last_checked_host: redhat1.example.com, last_checked_var: ansible_python;'NoneType' object has no attribute '__getitem__'"
}
- An example of a typical inventory file :
[masters]
#redhat1.example.com
redhat2.example.com
redhat3.example.com
# host group for etcd
[etcd]
#redhat1.example.com
redhat2.example.com
redhat3.example.com
[new_etcd]
redhat1.example.com
- The scaleup playbook recreates the directory
/etc/etcd/cawith the new CA. However the old etcd configuration still points to old CA.
Diagnostic Steps
- Collect the ansible run logs with -vvv options while running the playbooks. e.g :
$ ansible-playbook -i <inventory> /usr/share/ansible/openshift-ansible/playbooks/config.yml -vvv |tee ansible-logs.out
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments