Configuration conflict between Red Hat GitOps Operator and GitLab Runner Operator
Environment
-
Red Hat OpenShift Service on AWS (ROSA)
- 4.x
-
Red Hat OpenShift GitOps
- 1.5.4 provided by Red Hat Inc
Issue
-
GitOps Operator breaks when
GitLab Runner 1.9.0 Operator
is also installed to the cluster. -
Pod selector matches pods installed by both operators which causes random deployment errors.
Resolution
- This is a bug in the
Gitops operator
which has been fixed in release version ofGitops 1.7.0
.
Root Cause
-
Both the GitLab Operator Pods and the Red Hat OpenShift GitOps Pods use the label
control-plane: controller-manager
which causes the GitLab k8s service to sometimes route traffic to the GitOps Pods , resulting in failed attempts to create or modifyRunner
YAML resources. -
It should have targeted more labels / selectors.
Diagnostic Steps
- Check the pods
$ oc get po
NAME READY STATUS RESTARTS AGE
gitlab-runner-controller-manager-xxxxxxxxxxxx-xxxxx 2/2 Running 0 15d
gitops-operator-controller-manager-xxxxxxxxxxx-xxxxxx 1/1 Running 0 15d
- Check the labels of the service
$ oc get svc --show-labels
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE LABELS
gitlab-runner-xxxxx-xxxxxxx-xxxxx-service ClusterIP 172.xx.xxx.xxx <none> 8443/TCP 30d control-plane=controller-manager,operators.coreos.com/gitlab-runner-operator.openshift-operators=
gitlab-runner-xxxxx-xxxxxxx-xxxxx-service ClusterIP 172.xx.xxx.xxx <none> 443/TCP 15d operators.coreos.com/gitlab-runner-operator.openshift-operators=
gitlab-runner-xxxxx-service ClusterIP 172.xx.xxx.xxx <none> 443/TCP 30d operators.coreos.com/gitlab-runner-operator.openshift-operators=
- Check the Gitlab-Runner operator logs
2022-07-20T12:30:09.924Z ERROR controller-runtime.manager.controller.runner Reconciler error {"reconciler group": "xxxx.gitlab.com", "reconciler kind": "Runner", "name": "xxxx-xxxx-xxx-x", "namespace": "custom-xxx-xxxxxxx", "error": "Internal error occurred: failed calling webhook \"mrunner.kb.io\": failed to call webhook: Post \"https://gitlab-runner-xxxxxx-xxxxxx-service.openshift-operators.svc:443/xxxx-xxx-gitlab-com-v1beta2-runner?timeout=10s\": dial tcp xx.xxx.xx.xxx:9443: connect: connection refused"}
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments