Can't login to OpenShift after creating an "OperatorGroup" having "name: cluster"

Solution Unverified - Updated -

Environment

  • OpenShift Container Platform 4.12

Issue

After creating an OperatorGroup resource having "name: cluster" causes major issues and we can't login to the cluster anymore. When trying to run the command oc login... the console/oauth endpoint shows:

    {"error":"server_error","error_description":"The authorization server encountered an unexpected condition that prevented it from fulfilling the request.","state":"c72af27d"}

Resolution

If a cluster is effected, you may fix it if you still have the install time kubeconfig:

  • Login as system:admin using certificates (from bastion host)
  • Delete the operator group cluster that is causing the problem
  • Reboot the cluster
  • The original cluster-admin role should be restored

Root Cause

The issue is caused by OLM overwriting the cluster-admin role, which causes major issues, bringing key components down.

When an OperatorGroup is created three cluster roles are created in the form:

  • <operatorgroup_name>-admin
  • <operatorgroup_name>-edit
  • <operatorgroup_name>-view

The problem is that the cluster roles cluster-admin already exist in the system by default and it is overwritten by the new one created causing a conflict.

It seems like after checking if the role exists and they are not the same, the role gets updated. Check the code clicking here

This code overwrites the cluster-admin role with this:

    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: cluster-admin
    rules: null

While the original cluster-admin is:

    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      annotations:
        rbac.authorization.kubernetes.io/autoupdate: "true"
      labels:
        kubernetes.io/bootstrapping: rbac-defaults
      name: cluster-admin
    rules:
    - apiGroups:
      - '*'
      resources:
      - '*'
      verbs:
      - '*'
    - nonResourceURLs:
      - '*'
      verbs:
      - '*'

So it breaks access for every User/SA having cluster-admin role assigned.

Diagnostic Steps

Steps to reproduce:

By logged in as cluster-admin run:

    $ oc apply -f - <<EOF
    apiVersion: operators.coreos.com/v1
    kind: OperatorGroup
    metadata:
      name: cluster
    spec: {}
    EOF

After this command all oc commands fail including oc login.... The console/oauth endpoint shows:

    {"error":"server_error","error_description":"The authorization server encountered an unexpected condition that prevented it from fulfilling the request.","state":"c72af27d"}

Notes:

  • Restarting the cluster doesn't solve the problem, it's persistent.
  • Reproduced with OpenShift 4.12.12 and 4.13.0 in different environments (ROSA, CRC, etc)
  • Using a different name for the OperatorGroup is a simple workaround. The name cluster seems to cause the problem.
  • It doesn't matter what namespace the OperatorGroup is created in or how the spec looks like

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments