cluster-baremetal-operator is in crashLoopBackOff State.

Solution Verified - Updated -

Environment

  • Red Hat OpenShift Container Platform 4

Issue

  • cluster-baremetal-operator went into the crashLoopBackOff State.
  • The pod logs showing the below error:
unable to create controller' err='could not lookupIP for internal APIServer: api-int.aro.adwexaz.local: lookup api-int.aro.adwexaz.local: no such host' controller='Provisioning'

Resolution

  • In order to prevent the CrashLoopBackOff State of the operator, the resolution for the domain api-int should be added either recursively or through a DNS record on all the DNS servers configured as forwarders.

  • Using the nslookup command, the IP can be fetched using:
    $ oc debug node/[node_name] -- chroot /host bash -c api-int.<cluster_namename>.<base_domain>

  • After applying the changes, restart the operator pod and the operator should be able to resolve the crashloop issue.

Root Cause

  • The Cluster Baremetal Operator is a component installed by the OpenShift Cluster Version Operator, but it is only active when the cluster is installed on baremetal. T his operator will detect that it is not running on bare metal and disable itself. However, in order to check that, it needs to resolve the domain api-int.<cluster_namename>.<base_domain>.

  • Since the Cluster has custom DNS forwarders configured with the customer DNS zone (eg: adwexaz.local), the name was not able to be resolved. Due to this wrong custom DNS configuration in the cluster, the cluster-baremetal-operator reports a crashLoopBackOff error state.

Diagnostic Steps

  • Check the logs from the cluster-baremetal-operator:
$ oc logs -c cluster-baremetal-operator cluster-baremetal-operator-XXXXXXXX-XXXX
I0712 13:57:34.461849       1 request.go:655] Throttling request took 1.032561003s, request: GET:https://10.244.0.1:443/apis/coordination.k8s.io/v1beta1?timeout=32s
I0712 13:57:36.589774       1 listener.go:44] controller-runtime/metrics "msg"="metrics server is starting to listen"  "addr"=":8080"
I0712 13:57:36.609522       1 webhook.go:114] WebhookDependenciesReady: everything ready for webhooks
E0712 13:57:36.648157       1 main.go:117] "unable to create controller" err="could not lookupIP for internal APIServer: api-int.aro.adwexaz.local: lookup api-int.aro.adwexaz.local: no such host" controller="Provisioning"

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments