Etcd scaleup fails when first master/etcd node in ansible inventory fails

Environment

Red Hat Enterprise Linux (RHEL) 7.6
Red Hat OpenShift Container Platform (OCP) v3.9
Red Hat Ansible Engine 2.4

Issue

While trying to add an existing lost master/etcd node ( typically the first master in the ansible hosts file ) back into the cluster the openshift-etcd/scaleup.yml playbook fails

"client: etcd cluster is unavailable or misconfigured; error #0: client: etcd member https://x.x.x.x:2379 has no leader\n; error #1: client: etcd member https://x.x.x.x:2379 has no leader", "stderr_lines": ["client: etcd cluster is unavailable or misconfigured; error #0: client: etcd member https://x.x.x.x:2379 has no leader", "; error #1: client: etcd member https://x.x.x.x:2379 has no leader"], "stdout": "", "stdout_lines": []}
...

FAILED! => {"failed": true, "msg": "last_checked_host: redhat.example.com, last_checked_var: ansible_python;'NoneType' object has no attribute '__getitem__'"}

Resolution

Bring back the first master/etcd node by starting the master services by running the playbooks/openshift-master/scaleup.yml playbook reference
Redeploy the etcd certs by running , first new-etcd-ca and then etcd-certificates
Run the etcd scale up playbook reference

Root Cause

Adding an existing collocated master/etcd node ( first node defined in the ansible inventory ) using the scaleup playbook fails. The error seen is due to ansible facts failing:

2019-06-27 13:55:58,233 p=90215 u=root |  TASK [Run variable sanity checks] **************************************************************************************
2019-06-27 13:55:58,233 p=90215 u=root |  task path: /usr/share/ansible/openshift-ansible/playbooks/init/sanity_checks.yml:13
2019-06-27 13:55:58,411 p=90215 u=root |  fatal: [redhat1.example.com]: FAILED! => {
    "failed": true, 
    "msg": "last_checked_host: redhat1.example.com, last_checked_var: ansible_python;'NoneType' object has no attribute '__getitem__'"
}

An example of a typical inventory file :

[masters]
#redhat1.example.com
redhat2.example.com
redhat3.example.com

# host group for etcd
[etcd]
#redhat1.example.com
redhat2.example.com
redhat3.example.com
[new_etcd]
redhat1.example.com

The scaleup playbook recreates the directory /etc/etcd/ca with the new CA. However the old etcd configuration still points to old CA.

Diagnostic Steps

Collect the ansible run logs with -vvv options while running the playbooks. e.g :

$ ansible-playbook -i <inventory> /usr/share/ansible/openshift-ansible/playbooks/config.yml -vvv |tee ansible-logs.out

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Select Your Language

Etcd scaleup fails when first master/etcd node in ansible inventory fails

Environment

Issue

Resolution

Root Cause

Diagnostic Steps

Comments

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Environment

Issue

Resolution

Root Cause

Diagnostic Steps

Comments

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links