OCP bare metal installation issue: Degraded state : Ingress, console, monitoring, authentication , kube controller

Posted on

Hi Experts,
Trying to install RHOCP 4.12 , using RHCOS 4.13.5 for bootstrap, master(3) and worker (1). Have 1 VM for haproxy and 1 for DNS.
Bootstrap is completed. oc get nodes shows nodes are ready. oc get co however shows 5 components degraded.
[root@localhost log]#oc get co
authentication 4.12.0 False False True 8m13s OAuthServerRouteEndpointAcc
ingress 4.12.0 True True True 3d18h The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: PodsScheduled=False (PodsNotScheduled: Some pods are not scheduled: Pod "router-default-65bfdb9dfc-k5z2r" cannot be scheduled: 0/4 nodes.Make sure you have sufficient worker nodes.), DeploymentReplicasAllAvailable=False (DeploymentReplicasNotAvailable: 1/2 of replicas are available)
kube-controller-manager 4.12.0 True False True 3d19h GarbageCollectorDegraded: error fetching rules: Get "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules": dial tcp: lookup thanos-querier.openshift-monitoring.svc on 172.30.0.10:53: read udp 10.129.0.2:36768->172.30.0.10:53: i/o timeout
machine-config 4.12.0 True False True 3d19h Failed to resync 4.12.0 because: error during syncRequiredMachineConfigPools: [timed out waiting for the condition, error required
monitoring False True True 3d19h reconciling Prometheus Operator Admission Webhook Deployment failed: updating Deployment object failed:

[root@localhost log]# oc get nodes
NAME STATUS ROLES AGE VERSION
master-0.timeocp.com Ready control-plane,master 3d20h v1.25.4+77bec7a
master-1.timeocp.com Ready control-plane,master 3d20h v1.25.4+77bec7a
master-2.timeocp.com Ready control-plane,master 3d19h v1.25.4+77bec7a
worker-0.timeocp.com Ready worker 3d19h v1.25.4+77bec7a

[root@localhost log]# oc get csr
No resources found

[root@localhost log]# oc get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-2 Healthy {"health":"true","reason":""}
etcd-0 Healthy {"health":"true","reason":""}
etcd-1 Healthy {"health":"true","reason":""}
etcd-3 Healthy {"health":"true","reason":""}

[root@localhost ocp-install]# ssh -i sshkey core@master-0
When I try to login to the master or worker, I get SSL error (Host key verification failed.)
From RH community, tried to debug, and as per suggestion there, did curl to the url . It returned me an SSL error.
[root@localhost ocp-install]# !curl
curl https://oauth-openshift.apps.ocp.timeocp.com/healthz
curl: (60) SSL certificate problem: self-signed certificate in certificate chain
---- Not sure where I am going wrong. Could not fix it after spending a full day browsing through RH community posts.

Any hint, please respond. It will be really helpful.

Thank you
Suresh

Responses