Cluster operator cloud-credential is degraded: InvalidClientTokenId

Solution Verified - Updated -

Environment

  • Openshift Container Platform 4.7+

Issue

  • Cloud-credential operator reporting the following conditions and is in degraded state:
Conditions:
    Last Transition Time:  2021-07-05T16:22:28Z
    Status:                True
    Type:                  Available
    Last Transition Time:  2021-08-27T07:33:30Z
    Message:               1 of 5 credentials requests are failing to sync.
    Reason:                CredentialsFailing
    Status:                True
    Type:                  Degraded
    Last Transition Time:  2021-08-27T07:43:58Z
    Message:               4 of 5 credentials requests provisioned, 1 reporting errors.
    Reason:                Reconciling
    Status:                True
    Type:                  Progressing
  • The pod logs indicate for the cloud operator:
2021-08-27T07:44:00.242993569Z time="2021-08-27T07:44:00Z" level=info msg="validating cloud cred secret" controller=secretannotator
2021-08-27T07:44:00.315875863Z time="2021-08-27T07:44:00Z" level=error msg="error while validating cloud credentials: failed checking create cloud creds: error gathering AWS credentials details: error querying username: InvalidClientTokenId: The security token included in the request is invalid.\n\tstatus code: 403, request id: <redacted>" controller=secretannotator
2021-08-27T07:44:00.405199151Z time="2021-08-27T07:44:00Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-machine-api-aws
2021-08-27T07:44:00.792038495Z time="2021-08-27T07:44:00Z" level=error msg="cloud credentials insufficient to satisfy credentials request" actuator=aws cr=openshift-cloud-credential-operator/openshift-machine-api-aws
2021-08-27T07:44:00.792038495Z time="2021-08-27T07:44:00Z" level=error msg="error syncing credentials: cloud credentials insufficient to satisfy credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-machine-api-aws secret=openshift-machine-api/aws-cloud-credentials
2021-08-27T07:44:00.792062149Z time="2021-08-27T07:44:00Z" level=error msg="errored with condition: InsufficientCloudCreds" controller=credreq cr=openshift-cloud-credential-operator/openshift-machine-api-aws secret=openshift-machine-api/aws-cloud-credentials
  • Our primary error:
AWS credentials details: error querying username: InvalidClientTokenId: The security token included in the request is invalid.\n\tstatus code: 403

Resolution

  • Delete the secret aws-cloud-credentials from the namespace openshift-machine-api to see if the cloud credential satisfy the credential request. Wait for reconciliation.

[Note: In AWS Mint mode, the cluster operator creates a CredentialRequest, which is reconciled by the Cloud Credential Operator (CCO). CCO then creates the required IAM user or role in AWS, attaches the necessary policy statements, generates access keys, and stores them in a Kubernetes Secret that the operator uses to access AWS APIs.]


$ oc create secret generic aws-cloud-credentials -n openshift-machine-api --from-literal="aws_access_key_id=${AWS_ACCESS_KEY_ID}" --from-literal="aws_secret_access_key=${AWS_SECRET_ACCESS_KEY}" --dry-run -o yaml | oc replace -f - $ oc create secret generic aws-creds -n kube-system --from-literal="aws_access_key_id=${AWS_ACCESS_KEY_ID}" --from-literal="aws_secret_access_key=${AWS_SECRET_ACCESS_KEY}" --dry-run -o yaml | oc replace -f -

Ensure that you have provided the access level: 'AdministratorAccess' for both secrets:

$ oc get secret/aws-cloud-credentials -n openshift-machine-api
$ oc get secret/aws-creds -n kube-system

See also: https://access.redhat.com/solutions/4284011

Root Cause

Diagnostic Steps

  • Observed operators in degraded status, clusterversion -o yaml indicated that it was waiting on cloud-credential operator

  • Describe on cluster-credential detailed that some credentials were unable to sync

  • Checked pod logs which indicated:

AWS credentials details: error querying username: InvalidClientTokenId: The security token included in the request is invalid.\n\tstatus code: 403
  • Confirmed credentials were valid

  • Confirmed no SCP in place on AWS

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments