Upgrade OCP to 3.7 breaks service catalog etcd entries

Solution In Progress - Updated -

Issue

Upgraded control plane first then the nodes, rebooted all servers and now I am unable to upgrade the service catalog

Playbook fails:

TASK [openshift_service_catalog : wait for api server to be ready] ***************************************************************************************************************************
FAILED - RETRYING: wait for api server to be ready (120 retries left).
FAILED - RETRYING: wait for api server to be ready (119 retries left).
FAILED - RETRYING: wait for api server to be ready (118 retries left).
FAILED - RETRYING: wait for api server to be ready (117 retries left).
FAILED - RETRYING: wait for api server to be ready (116 retries left).
...
FAILED - RETRYING: wait for api server to be ready (5 retries left).
FAILED - RETRYING: wait for api server to be ready (4 retries left).
FAILED - RETRYING: wait for api server to be ready (3 retries left).
FAILED - RETRYING: wait for api server to be ready (2 retries left).
FAILED - RETRYING: wait for api server to be ready (1 retries left).
fatal: [ip-10-53-3-151.ec2.internal]: FAILED! => {"attempts": 120, "changed": false, "cmd": ["curl", "-k", "https://apiserver.kube-service-catalog.svc/healthz"], "delta": "0:00:00.063268", "end": "2018-01-12 18:42:31.270849", "rc": 0, "start": "2018-01-12 18:42:31.207581", "stderr": "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n                                 Dload  Upload   Total   Spent    Left  Speed\n\r  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\r100   180  100   180    0     0   3136      0 --:--:-- --:--:-- --:--:--  3157", "stderr_lines": ["  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current", "                                 Dload  Upload   Total   Spent    Left  Speed", "", "  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0", "100   180  100   180    0     0   3136      0 --:--:-- --:--:-- --:--:--  3157"], "stdout": "[+]ping ok\n[+]poststarthook/generic-apiserver-start-informers ok\n[+]poststarthook/start-service-catalog-apiserver-informers ok\n[-]etcd failed: reason withheld\nhealthz check failed", "stdout_lines": ["[+]ping ok", "[+]poststarthook/generic-apiserver-start-informers ok", "[+]poststarthook/start-service-catalog-apiserver-informers ok", "[-]etcd failed: reason withheld", "healthz check failed"]}
        to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/service-catalog.retry

PLAY RECAP ***********************************************************************************************************************************************************************************
ip-10-53-0-226.ec2.internal : ok=28   changed=2    unreachable=0    failed=0
ip-10-53-1-133.ec2.internal : ok=43   changed=2    unreachable=0    failed=0
ip-10-53-1-16.ec2.internal : ok=42   changed=2    unreachable=0    failed=0
ip-10-53-1-99.ec2.internal : ok=42   changed=2    unreachable=0    failed=0
ip-10-53-3-151.ec2.internal : ok=79   changed=18   unreachable=0    failed=1
ip-10-53-3-178.ec2.internal : ok=42   changed=2    unreachable=0    failed=0
ip-10-53-3-240.ec2.internal : ok=42   changed=2    unreachable=0    failed=0
ip-10-53-4-127.ec2.internal : ok=43   changed=2    unreachable=0    failed=0
ip-10-53-4-221.ec2.internal : ok=42   changed=2    unreachable=0    failed=0
ip-10-53-4-84.ec2.internal : ok=42   changed=2    unreachable=0    failed=0
localhost                  : ok=12   changed=0    unreachable=0    failed=0


INSTALLER STATUS *****************************************************************************************************************************************************************************
Initialization             : Complete
Service Catalog Install    : In Progress
        This phase can be restarted by running: playbooks/byo/openshift-cluster/service-catalog.yml
  • Manually curling the URL gives:
    • ping ok
    • poststarthook/generic-apiserver-start-informers ok
    • poststarthook/start-service-catalog-apiserver-informers ok
    • etcd failed: reason withheld
    • healthz check failed

Checked and verified the OAB ETCD container is up but found the following error in the logs:

2018-01-12 18:13:38.544464 I | etcdserver/api/v3rpc: Failed to dial 0.0.0.0:2379: connection error: desc = "transport: remote error: tls: bad certificate"; please retry.

Environment

  • OpenShift Container Platform
    • 3.7.9
    • 3.7.14

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content