Can't login to OpenShift after creating an "OperatorGroup" having "name: cluster"
Environment
- OpenShift Container Platform 4.12
Issue
After creating an OperatorGroup
resource having "name: cluster" causes major issues and we can't login to the cluster anymore. When trying to run the command oc login...
the console/oauth endpoint shows:
{"error":"server_error","error_description":"The authorization server encountered an unexpected condition that prevented it from fulfilling the request.","state":"c72af27d"}
Resolution
If a cluster is effected, you may fix it if you still have the install time kubeconfig:
- Login as
system:admin
using certificates (from bastion host) - Delete the operator group
cluster
that is causing the problem - Reboot the cluster
- The original
cluster-admin
role should be restored
Root Cause
The issue is caused by OLM overwriting the cluster-admin
role, which causes major issues, bringing key components down.
When an OperatorGroup
is created three cluster roles are created in the form:
- <operatorgroup_name>-admin
- <operatorgroup_name>-edit
- <operatorgroup_name>-view
The problem is that the cluster roles cluster-admin
already exist in the system by default and it is overwritten by the new one created causing a conflict.
It seems like after checking if the role exists and they are not the same, the role gets updated. Check the code clicking here
This code overwrites the cluster-admin
role with this:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cluster-admin
rules: null
While the original cluster-admin
is:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
annotations:
rbac.authorization.kubernetes.io/autoupdate: "true"
labels:
kubernetes.io/bootstrapping: rbac-defaults
name: cluster-admin
rules:
- apiGroups:
- '*'
resources:
- '*'
verbs:
- '*'
- nonResourceURLs:
- '*'
verbs:
- '*'
So it breaks access for every User/SA having cluster-admin
role assigned.
Diagnostic Steps
Steps to reproduce:
By logged in as cluster-admin run:
$ oc apply -f - <<EOF
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
name: cluster
spec: {}
EOF
After this command all oc
commands fail including oc login...
. The console/oauth endpoint shows:
{"error":"server_error","error_description":"The authorization server encountered an unexpected condition that prevented it from fulfilling the request.","state":"c72af27d"}
Notes:
- Restarting the cluster doesn't solve the problem, it's persistent.
- Reproduced with OpenShift 4.12.12 and 4.13.0 in different environments (ROSA, CRC, etc)
- Using a different name for the
OperatorGroup
is a simple workaround. The namecluster
seems to cause the problem. - It doesn't matter what namespace the
OperatorGroup
is created in or how thespec
looks like
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments