Cluster operator cloud-credential is degraded: InvalidClientTokenId
Environment
- Openshift Container Platform 4.7+
Issue
- Cloud-credential operator reporting the following conditions and is in degraded state:
Conditions:
Last Transition Time: 2021-07-05T16:22:28Z
Status: True
Type: Available
Last Transition Time: 2021-08-27T07:33:30Z
Message: 1 of 5 credentials requests are failing to sync.
Reason: CredentialsFailing
Status: True
Type: Degraded
Last Transition Time: 2021-08-27T07:43:58Z
Message: 4 of 5 credentials requests provisioned, 1 reporting errors.
Reason: Reconciling
Status: True
Type: Progressing
- The pod logs indicate for the cloud operator:
2021-08-27T07:44:00.242993569Z time="2021-08-27T07:44:00Z" level=info msg="validating cloud cred secret" controller=secretannotator
2021-08-27T07:44:00.315875863Z time="2021-08-27T07:44:00Z" level=error msg="error while validating cloud credentials: failed checking create cloud creds: error gathering AWS credentials details: error querying username: InvalidClientTokenId: The security token included in the request is invalid.\n\tstatus code: 403, request id: <redacted>" controller=secretannotator
2021-08-27T07:44:00.405199151Z time="2021-08-27T07:44:00Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-machine-api-aws
2021-08-27T07:44:00.792038495Z time="2021-08-27T07:44:00Z" level=error msg="cloud credentials insufficient to satisfy credentials request" actuator=aws cr=openshift-cloud-credential-operator/openshift-machine-api-aws
2021-08-27T07:44:00.792038495Z time="2021-08-27T07:44:00Z" level=error msg="error syncing credentials: cloud credentials insufficient to satisfy credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-machine-api-aws secret=openshift-machine-api/aws-cloud-credentials
2021-08-27T07:44:00.792062149Z time="2021-08-27T07:44:00Z" level=error msg="errored with condition: InsufficientCloudCreds" controller=credreq cr=openshift-cloud-credential-operator/openshift-machine-api-aws secret=openshift-machine-api/aws-cloud-credentials
- Our primary error:
AWS credentials details: error querying username: InvalidClientTokenId: The security token included in the request is invalid.\n\tstatus code: 403
Resolution
- Delete the secret
aws-cloud-credentialsfrom the namespaceopenshift-machine-apito see if the cloud credential satisfy the credential request. Wait for reconciliation.
[Note: In AWS Mint mode, the cluster operator creates a CredentialRequest, which is reconciled by the Cloud Credential Operator (CCO). CCO then creates the required IAM user or role in AWS, attaches the necessary policy statements, generates access keys, and stores them in a Kubernetes Secret that the operator uses to access AWS APIs.]
-
Issue may stem from usage of MFA authentication setup incorrectly using short-term tokens instead of access keys
-
Use This AWS support article to generate a new credential access key
-
Refer to our docs pages for installing on AWS with iam user account to configure access user with modified key instead of relying on MFA token setup.
-
Do NOT use set up guide from here: https://aws.amazon.com/de/premiumsupport/knowledge-center/authenticate-mfa-cli/ as this will result in above errors.
-
You may also attempt to refresh your secrets using the same key data in an attempt to refresh the operator:
$ oc create secret generic aws-cloud-credentials -n openshift-machine-api --from-literal="aws_access_key_id=${AWS_ACCESS_KEY_ID}" --from-literal="aws_secret_access_key=${AWS_SECRET_ACCESS_KEY}" --dry-run -o yaml | oc replace -f -
$ oc create secret generic aws-creds -n kube-system --from-literal="aws_access_key_id=${AWS_ACCESS_KEY_ID}" --from-literal="aws_secret_access_key=${AWS_SECRET_ACCESS_KEY}" --dry-run -o yaml | oc replace -f -
Ensure that you have provided the access level: 'AdministratorAccess' for both secrets:
$ oc get secret/aws-cloud-credentials -n openshift-machine-api
$ oc get secret/aws-creds -n kube-system
Root Cause
-
Issue is detailed in this BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1724684
-
MFA-cli tools may not be fully supported and can cause issues with access as the tokens expire faster than a longterm platform like RHOCP requires to be available.
Diagnostic Steps
-
Observed operators in degraded status, clusterversion -o yaml indicated that it was waiting on cloud-credential operator
-
Describe on cluster-credential detailed that some credentials were unable to sync
-
Checked pod logs which indicated:
AWS credentials details: error querying username: InvalidClientTokenId: The security token included in the request is invalid.\n\tstatus code: 403
-
Confirmed credentials were valid
-
Confirmed no SCP in place on AWS
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments