How to find the current kubernetes controller manager (KCM) in Red Hat OpenShift Container Platform 4.x

Solution In Progress - Updated -

Environment

Red Hat OpenShift Container Platform 4.x

Issue

How to find the current kubernets controller manager (KCM) in Red Hat OpenShift Container Platform 4.x

Resolution

One can check the currently active kube controller manager (KCM) with:

oc get cm/kube-controller-manager -o yaml -n kube-system

For example:

[cloud-user@jump-server openshift]$ oc get cm/kube-controller-manager -o yaml -n kube-system
apiVersion: v1
kind: ConfigMap
metadata:
  annotations:
    control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"osc-j9qk9-master-2_140c0882-1ce8-4b87-a0b9-4b792341a59c","leaseDurationSeconds":15,"acquireTime":"2020-06-13T19:26:49Z","renewTime":"2020-06-18T13:12:36Z","leaderTransitions":18}'
  creationTimestamp: "2020-06-09T09:43:24Z"
  name: kube-controller-manager
  namespace: kube-system
  resourceVersion: "5058158"
  selfLink: /api/v1/namespaces/kube-system/configmaps/kube-controller-manager
  uid: 9ac67e5c-2572-4f31-9d75-7b65d330b849

One can also find the current Kubernetes Controller Manager by connecting to the OCP web console, then selecting Monitoring -> Metrics and then searching for: leader_election_master_status. The result will look similar to:

kube-controller-manager https   192.168.0.14:10257  kube-controller-manager kube-controller-manager openshift-kube-controller-manager   kube-controller-manager-osc-j9qk9-master-1  openshift-monitoring/k8s    kube-controller-manager 0
kube-scheduler  https   192.168.0.14:10259  scheduler   kube-scheduler  openshift-kube-scheduler    openshift-kube-scheduler-osc-j9qk9-master-1 openshift-monitoring/k8s    scheduler   0
kube-controller-manager https   192.168.0.16:10257  kube-controller-manager kube-controller-manager openshift-kube-controller-manager   kube-controller-manager-osc-j9qk9-master-0  openshift-monitoring/k8s    kube-controller-manager 0
kube-scheduler  https   192.168.0.16:10259  scheduler   kube-scheduler  openshift-kube-scheduler    openshift-kube-scheduler-osc-j9qk9-master-0 openshift-monitoring/k8s    scheduler   0
kube-controller-manager https   192.168.0.38:10257  kube-controller-manager kube-controller-manager openshift-kube-controller-manager   kube-controller-manager-osc-j9qk9-master-2  openshift-monitoring/k8s    kube-controller-manager 1
kube-scheduler  https   192.168.0.38:10259  scheduler   kube-scheduler  openshift-kube-scheduler    openshift-kube-scheduler-osc-j9qk9-master-2 openshift-monitoring/k8s    scheduler   1

It is also possible to determine the current KCM leader by looking at the current KCM logs. The manager posting current status updates should be the leader:

[cloud-user@jump-server openshift]$ oc logs -n openshift-kube-controller-manager kube-controller-manager-osc-j9qk9-master-0 | tail -n5
I0613 19:43:03.909880       1 named_certificates.go:53] loaded SNI cert [0/"self-signed loopback"]: "apiserver-loopback-client@1592077383" [serving] validServingFor=[apiserver-loopback-client] issuer="apiserver-loopback-client-ca@1592077383" (2020-06-13 18:43:03 +0000 UTC to 2021-06-13 18:43:03 +0000 UTC (now=2020-06-13 19:43:03.909845591 +0000 UTC))
I0613 19:43:03.909937       1 secure_serving.go:178] Serving securely on [::]:10257
I0613 19:43:03.910005       1 leaderelection.go:242] attempting to acquire leader lease  kube-system/kube-controller-manager...
I0613 19:43:03.911035       1 tlsconfig.go:241] Starting DynamicServingCertificateController
E0613 19:43:06.851463       1 leaderelection.go:331] error retrieving resource lock kube-system/kube-controller-manager: configmaps "kube-controller-manager" is forbidden: User "system:kube-controller-manager" cannot get resource "configmaps" in API group "" in the namespace "kube-system"

[cloud-user@jump-server openshift]$ oc logs -n openshift-kube-controller-manager kube-controller-manager-osc-j9qk9-master-1 | tail -n5
I0613 19:26:38.430174       1 tlsconfig.go:241] Starting DynamicServingCertificateController
E0613 19:26:38.431594       1 leaderelection.go:331] error retrieving resource lock kube-system/kube-controller-manager: Get https://localhost:6443/api/v1/namespaces/kube-system/configmaps/kube-controller-manager?timeout=10s: dial tcp [::1]:6443: connect: connection refused
E0613 19:26:42.379998       1 leaderelection.go:331] error retrieving resource lock kube-system/kube-controller-manager: Get https://localhost:6443/api/v1/namespaces/kube-system/configmaps/kube-controller-manager?timeout=10s: dial tcp [::1]:6443: connect: connection refused
E0613 19:26:45.482819       1 leaderelection.go:331] error retrieving resource lock kube-system/kube-controller-manager: Get https://localhost:6443/api/v1/namespaces/kube-system/configmaps/kube-controller-manager?timeout=10s: dial tcp [::1]:6443: connect: connection refused
E0613 19:26:51.865225       1 leaderelection.go:331] error retrieving resource lock kube-system/kube-controller-manager: configmaps "kube-controller-manager" is forbidden: User "system:kube-controller-manager" cannot get resource "configmaps" in API group "" in the namespace "kube-system"

[cloud-user@jump-server openshift]$ oc logs -n openshift-kube-controller-manager kube-controller-manager-osc-j9qk9-master-2 | tail -n5
I0618 16:00:13.757796       1 deployment_controller.go:484] Error syncing deployment openshift-monitoring/prometheus-operator: Operation cannot be fulfilled on deployments.apps "prometheus-operator": the object has been modified; please apply your changes to the latest version and try again
I0618 16:00:24.162296       1 deployment_controller.go:484] Error syncing deployment openshift-monitoring/telemeter-client: Operation cannot be fulfilled on deployments.apps "telemeter-client": the object has been modified; please apply your changes to the latest version and try again
I0618 16:00:26.760618       1 deployment_controller.go:484] Error syncing deployment openshift-monitoring/thanos-querier: Operation cannot be fulfilled on deployments.apps "thanos-querier": the object has been modified; please apply your changes to the latest version and try again
I0618 16:00:27.944676       1 deployment_controller.go:484] Error syncing deployment openshift-monitoring/prometheus-adapter: Operation cannot be fulfilled on deployments.apps "prometheus-adapter": the object has been modified; please apply your changes to the latest version and try again
I0618 16:00:33.358075       1 deployment_controller.go:484] Error syncing deployment openshift-monitoring/grafana: Operation cannot be fulfilled on deployments.apps "grafana": the object has been modified; please apply your changes to the latest version and try again

Root Cause

Disclaimer: Links contained herein to external website(s) are provided for convenience only. Red Hat has not reviewed the links and is not responsible for the content or its availability. The inclusion of any link to an external website does not imply endorsement by Red Hat of the website or their entities, products or services. You agree that Red Hat is not responsible or liable for any loss or expenses that may result due to your use of (or reliance on) the external site or content.

For upstream kubernetes, the leader election of the kubernetes controller manager is either coordinated via the original endpoint method or via configmaps. For details about endpoint election, see the following blog post: https://blog.heptio.com/leader-election-in-kubernetes-control-plane-heptioprotip-1ed9fb0f3e6d

However, in OCP the leader election is based on configmaps:

[cloud-user@jump-server openshift]$ oc -n kube-system get ep 
NAME             ENDPOINTS                                                               AGE
kube-scheduler   <none>                                                                  9d
kubelet          192.168.0.12:10255,192.168.0.14:10255,192.168.0.16:10255 + 15 more...   9d

Show all kube controller manager pods:

[cloud-user@jump-server openshift]$ oc get pods -A | grep kube-controller-manager | grep Running
openshift-kube-controller-manager-operator              kube-controller-manager-operator-784c96d5bd-k7pxk                 1/1     Running       2          4d20h
openshift-kube-controller-manager                       kube-controller-manager-osc-j9qk9-master-0                 4/4     Running       0          9d
openshift-kube-controller-manager                       kube-controller-manager-osc-j9qk9-master-1                 4/4     Running       0          9d
openshift-kube-controller-manager                       kube-controller-manager-osc-j9qk9-master-2                 4/4     Running       0          9d

Show the leader-election flags:

[cloud-user@jump-server openshift]$ oc logs -n openshift-kube-controller-manager kube-controller-manager-osc-j9qk9-master-0 | grep FLAG
(...)
I0613 19:43:03.629758       1 flags.go:33] FLAG: --leader-elect="true"
I0613 19:43:03.629780       1 flags.go:33] FLAG: --leader-elect-lease-duration="15s"
I0613 19:43:03.629834       1 flags.go:33] FLAG: --leader-elect-renew-deadline="10s"
I0613 19:43:03.629852       1 flags.go:33] FLAG: --leader-elect-resource-lock="configmaps"
I0613 19:43:03.629868       1 flags.go:33] FLAG: --leader-elect-resource-name="kube-controller-manager"
I0613 19:43:03.629883       1 flags.go:33] FLAG: --leader-elect-resource-namespace="kube-system"
I0613 19:43:03.629898       1 flags.go:33] FLAG: --leader-elect-retry-period="3s"
(...)

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments