How to find the current kubernetes controller manager (KCM) in Red Hat OpenShift Container Platform 4.x
Environment
Red Hat OpenShift Container Platform 4.x
Issue
How to find the current kubernets controller manager (KCM) in Red Hat OpenShift Container Platform 4.x
Resolution
One can check the currently active kube controller manager (KCM) with:
oc get cm/kube-controller-manager -o yaml -n kube-system
For example:
[cloud-user@jump-server openshift]$ oc get cm/kube-controller-manager -o yaml -n kube-system
apiVersion: v1
kind: ConfigMap
metadata:
annotations:
control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"osc-j9qk9-master-2_140c0882-1ce8-4b87-a0b9-4b792341a59c","leaseDurationSeconds":15,"acquireTime":"2020-06-13T19:26:49Z","renewTime":"2020-06-18T13:12:36Z","leaderTransitions":18}'
creationTimestamp: "2020-06-09T09:43:24Z"
name: kube-controller-manager
namespace: kube-system
resourceVersion: "5058158"
selfLink: /api/v1/namespaces/kube-system/configmaps/kube-controller-manager
uid: 9ac67e5c-2572-4f31-9d75-7b65d330b849
One can also find the current Kubernetes Controller Manager by connecting to the OCP web console, then selecting Monitoring
-> Metrics
and then searching for: leader_election_master_status
. The result will look similar to:
kube-controller-manager https 192.168.0.14:10257 kube-controller-manager kube-controller-manager openshift-kube-controller-manager kube-controller-manager-osc-j9qk9-master-1 openshift-monitoring/k8s kube-controller-manager 0
kube-scheduler https 192.168.0.14:10259 scheduler kube-scheduler openshift-kube-scheduler openshift-kube-scheduler-osc-j9qk9-master-1 openshift-monitoring/k8s scheduler 0
kube-controller-manager https 192.168.0.16:10257 kube-controller-manager kube-controller-manager openshift-kube-controller-manager kube-controller-manager-osc-j9qk9-master-0 openshift-monitoring/k8s kube-controller-manager 0
kube-scheduler https 192.168.0.16:10259 scheduler kube-scheduler openshift-kube-scheduler openshift-kube-scheduler-osc-j9qk9-master-0 openshift-monitoring/k8s scheduler 0
kube-controller-manager https 192.168.0.38:10257 kube-controller-manager kube-controller-manager openshift-kube-controller-manager kube-controller-manager-osc-j9qk9-master-2 openshift-monitoring/k8s kube-controller-manager 1
kube-scheduler https 192.168.0.38:10259 scheduler kube-scheduler openshift-kube-scheduler openshift-kube-scheduler-osc-j9qk9-master-2 openshift-monitoring/k8s scheduler 1
It is also possible to determine the current KCM leader by looking at the current KCM logs. The manager posting current status updates should be the leader:
[cloud-user@jump-server openshift]$ oc logs -n openshift-kube-controller-manager kube-controller-manager-osc-j9qk9-master-0 | tail -n5
I0613 19:43:03.909880 1 named_certificates.go:53] loaded SNI cert [0/"self-signed loopback"]: "apiserver-loopback-client@1592077383" [serving] validServingFor=[apiserver-loopback-client] issuer="apiserver-loopback-client-ca@1592077383" (2020-06-13 18:43:03 +0000 UTC to 2021-06-13 18:43:03 +0000 UTC (now=2020-06-13 19:43:03.909845591 +0000 UTC))
I0613 19:43:03.909937 1 secure_serving.go:178] Serving securely on [::]:10257
I0613 19:43:03.910005 1 leaderelection.go:242] attempting to acquire leader lease kube-system/kube-controller-manager...
I0613 19:43:03.911035 1 tlsconfig.go:241] Starting DynamicServingCertificateController
E0613 19:43:06.851463 1 leaderelection.go:331] error retrieving resource lock kube-system/kube-controller-manager: configmaps "kube-controller-manager" is forbidden: User "system:kube-controller-manager" cannot get resource "configmaps" in API group "" in the namespace "kube-system"
[cloud-user@jump-server openshift]$ oc logs -n openshift-kube-controller-manager kube-controller-manager-osc-j9qk9-master-1 | tail -n5
I0613 19:26:38.430174 1 tlsconfig.go:241] Starting DynamicServingCertificateController
E0613 19:26:38.431594 1 leaderelection.go:331] error retrieving resource lock kube-system/kube-controller-manager: Get https://localhost:6443/api/v1/namespaces/kube-system/configmaps/kube-controller-manager?timeout=10s: dial tcp [::1]:6443: connect: connection refused
E0613 19:26:42.379998 1 leaderelection.go:331] error retrieving resource lock kube-system/kube-controller-manager: Get https://localhost:6443/api/v1/namespaces/kube-system/configmaps/kube-controller-manager?timeout=10s: dial tcp [::1]:6443: connect: connection refused
E0613 19:26:45.482819 1 leaderelection.go:331] error retrieving resource lock kube-system/kube-controller-manager: Get https://localhost:6443/api/v1/namespaces/kube-system/configmaps/kube-controller-manager?timeout=10s: dial tcp [::1]:6443: connect: connection refused
E0613 19:26:51.865225 1 leaderelection.go:331] error retrieving resource lock kube-system/kube-controller-manager: configmaps "kube-controller-manager" is forbidden: User "system:kube-controller-manager" cannot get resource "configmaps" in API group "" in the namespace "kube-system"
[cloud-user@jump-server openshift]$ oc logs -n openshift-kube-controller-manager kube-controller-manager-osc-j9qk9-master-2 | tail -n5
I0618 16:00:13.757796 1 deployment_controller.go:484] Error syncing deployment openshift-monitoring/prometheus-operator: Operation cannot be fulfilled on deployments.apps "prometheus-operator": the object has been modified; please apply your changes to the latest version and try again
I0618 16:00:24.162296 1 deployment_controller.go:484] Error syncing deployment openshift-monitoring/telemeter-client: Operation cannot be fulfilled on deployments.apps "telemeter-client": the object has been modified; please apply your changes to the latest version and try again
I0618 16:00:26.760618 1 deployment_controller.go:484] Error syncing deployment openshift-monitoring/thanos-querier: Operation cannot be fulfilled on deployments.apps "thanos-querier": the object has been modified; please apply your changes to the latest version and try again
I0618 16:00:27.944676 1 deployment_controller.go:484] Error syncing deployment openshift-monitoring/prometheus-adapter: Operation cannot be fulfilled on deployments.apps "prometheus-adapter": the object has been modified; please apply your changes to the latest version and try again
I0618 16:00:33.358075 1 deployment_controller.go:484] Error syncing deployment openshift-monitoring/grafana: Operation cannot be fulfilled on deployments.apps "grafana": the object has been modified; please apply your changes to the latest version and try again
Root Cause
Disclaimer: Links contained herein to external website(s) are provided for convenience only. Red Hat has not reviewed the links and is not responsible for the content or its availability. The inclusion of any link to an external website does not imply endorsement by Red Hat of the website or their entities, products or services. You agree that Red Hat is not responsible or liable for any loss or expenses that may result due to your use of (or reliance on) the external site or content.
For upstream kubernetes, the leader election of the kubernetes controller manager is either coordinated via the original endpoint method or via configmaps. For details about endpoint election, see the following blog post: https://blog.heptio.com/leader-election-in-kubernetes-control-plane-heptioprotip-1ed9fb0f3e6d
However, in OCP the leader election is based on configmaps:
[cloud-user@jump-server openshift]$ oc -n kube-system get ep
NAME ENDPOINTS AGE
kube-scheduler <none> 9d
kubelet 192.168.0.12:10255,192.168.0.14:10255,192.168.0.16:10255 + 15 more... 9d
Show all kube controller manager pods:
[cloud-user@jump-server openshift]$ oc get pods -A | grep kube-controller-manager | grep Running
openshift-kube-controller-manager-operator kube-controller-manager-operator-784c96d5bd-k7pxk 1/1 Running 2 4d20h
openshift-kube-controller-manager kube-controller-manager-osc-j9qk9-master-0 4/4 Running 0 9d
openshift-kube-controller-manager kube-controller-manager-osc-j9qk9-master-1 4/4 Running 0 9d
openshift-kube-controller-manager kube-controller-manager-osc-j9qk9-master-2 4/4 Running 0 9d
Show the leader-election flags:
[cloud-user@jump-server openshift]$ oc logs -n openshift-kube-controller-manager kube-controller-manager-osc-j9qk9-master-0 | grep FLAG
(...)
I0613 19:43:03.629758 1 flags.go:33] FLAG: --leader-elect="true"
I0613 19:43:03.629780 1 flags.go:33] FLAG: --leader-elect-lease-duration="15s"
I0613 19:43:03.629834 1 flags.go:33] FLAG: --leader-elect-renew-deadline="10s"
I0613 19:43:03.629852 1 flags.go:33] FLAG: --leader-elect-resource-lock="configmaps"
I0613 19:43:03.629868 1 flags.go:33] FLAG: --leader-elect-resource-name="kube-controller-manager"
I0613 19:43:03.629883 1 flags.go:33] FLAG: --leader-elect-resource-namespace="kube-system"
I0613 19:43:03.629898 1 flags.go:33] FLAG: --leader-elect-retry-period="3s"
(...)
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments