In an RHV IPI environment, while adding a new node, it gets stuck in 'Provisioned' state and doesn't get added to the cluster.
Issue
- Earlier the kublet logs showed that,
kubelet_node_status.go:92] Unable to register node "failing-worker0-node" with API server: Post https://api-int.<cluster-name>.<subdomain>:6443/api/v1/nodes: dial tcp: lookup api-int.<cluster-name>.<subdomain> on <externald-dns-server-IP>:53: no such host
- Later the DNS resolution issue was resolved and then the node was still in 'Provisioned' state with below errors;
$ oc logs machine-api-controllers-xxxxx -c machine-controller | grep failing-worker0-node
{"level":"error","ts":1601989711.119791,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"machine_controller","request":"openshift-machine-api/failing-worker0-node","error":"Aborting reconciliation while VM failing-worker0-node state is reboot_in_progress","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/cluster-api-provider-ovirt/vendor/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/cluster-api-provider-ovirt/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:218\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/cluster-api-provider-ovirt/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:192\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/go/cluster-api-provider-ovirt/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:171\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/cluster-api-provider-ovirt/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/cluster-api-provider-ovirt/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/cluster-api-provider-ovirt/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}
I1006 13:08:51.600245 1 controller.go:164] Reconciling Machine "failing-worker0-node"
I1006 13:08:51.600367 1 controller.go:376] Machine "failing-worker0-node" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster
I1006 13:08:51.649498 1 controller.go:284] Reconciling machine "failing-worker0-node" triggers idempotent update
E1006 13:08:51.694577 1 controller.go:286] Error updating machine "openshift-machine-api/failing-worker0-node": Aborting reconciliation while VM failing-worker0-node state is reboot_in_progress
Environment
- Red Hat OpenShift Container Platform 4.5
- IPI based Installation with Red Hat Vitualization Platform.
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.