Configuration conflict between Red Hat GitOps Operator and GitLab Runner Operator

Solution Verified - Updated -

Environment

  • Red Hat OpenShift Service on AWS (ROSA)

    • 4.x
  • Red Hat OpenShift GitOps

    • 1.5.4 provided by Red Hat Inc

Issue

  • GitOps Operator breaks when GitLab Runner 1.9.0 Operator is also installed to the cluster.

  • Pod selector matches pods installed by both operators which causes random deployment errors.

Resolution

  • This is a bug in the Gitops operator which has been fixed in release version of Gitops 1.7.0 .

Root Cause

  • Both the GitLab Operator Pods and the Red Hat OpenShift GitOps Pods use the label control-plane: controller-manager which causes the GitLab k8s service to sometimes route traffic to the GitOps Pods , resulting in failed attempts to create or modify Runner YAML resources.

  • It should have targeted more labels / selectors.

Diagnostic Steps

  • Check the pods
$ oc get po
NAME                                                   READY    STATUS    RESTARTS   AGE
gitlab-runner-controller-manager-xxxxxxxxxxxx-xxxxx     2/2     Running   0          15d
gitops-operator-controller-manager-xxxxxxxxxxx-xxxxxx   1/1     Running   0          15d
  • Check the labels of the service
$ oc get svc --show-labels
NAME                                               TYPE        CLUSTER-IP     EXTERNAL-IP    PORT(S)      AGE    LABELS
gitlab-runner-xxxxx-xxxxxxx-xxxxx-service         ClusterIP   172.xx.xxx.xxx   <none>        8443/TCP     30d   control-plane=controller-manager,operators.coreos.com/gitlab-runner-operator.openshift-operators=
gitlab-runner-xxxxx-xxxxxxx-xxxxx-service         ClusterIP   172.xx.xxx.xxx   <none>        443/TCP      15d   operators.coreos.com/gitlab-runner-operator.openshift-operators=
gitlab-runner-xxxxx-service                       ClusterIP   172.xx.xxx.xxx   <none>        443/TCP      30d   operators.coreos.com/gitlab-runner-operator.openshift-operators=
  • Check the Gitlab-Runner operator logs
2022-07-20T12:30:09.924Z    ERROR   controller-runtime.manager.controller.runner    Reconciler error    {"reconciler group": "xxxx.gitlab.com", "reconciler kind": "Runner", "name": "xxxx-xxxx-xxx-x", "namespace": "custom-xxx-xxxxxxx", "error": "Internal error occurred: failed calling webhook \"mrunner.kb.io\": failed to call webhook: Post \"https://gitlab-runner-xxxxxx-xxxxxx-service.openshift-operators.svc:443/xxxx-xxx-gitlab-com-v1beta2-runner?timeout=10s\": dial tcp xx.xxx.xx.xxx:9443: connect: connection refused"}

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments