Prometheus serviceaccount missing permissions to monitor services in user-defined namespaces

Solution Verified - Updated -

Environment

  • Red Hat OpenShift Container Platform (RHOCP)
    • 4

Issue

  • Unable to monitor service in the custom created namespace dummyapp, where dummyapp is the custom created namespace.
  • Logs from pod prometheus-k8s-0 shows permissions error when monitoring services on namespace dummyapp:

    $ oc logs -c prometheus prometheus-k8s-0 -n openshift-monitoring | grep "cannot list resource"
    <...> 
    User "system:serviceaccount:openshift-monitoring:prometheus-k8s" cannot list resource "services" in API group "\" in the namespace "dummyapp"
    <...>
    

    Note: This could happen on multiple namespaces.

Resolution

There are 3 situations associated with this problem.

Situation 1:
The label openshift.io/cluster-monitoring: "true" is applied on user-defined namespace because of which prometheus-k8s pods from openshift-monitoring tries to scrape metrics from user-defined namespace.
This label openshift.io/cluster-monitoring: "true" should not be applied on user-defined namespaces as per support considerations

To fix this, remove the label from user-defined namespace and enable user-workload monitoring using this documentation. To remove the label, refer below command:

$ oc label namespace <name-of-namespace> openshift.io/cluster-monitoring-

Situation 2:
Additional user-defined ServiceMonitors are created in the openshift-* and kube-* projects.
Additional user-defined ServiceMonitors should not be created in the openshift-* and kube-* projects as per support considerations

To fix this, remove the additional user-defined ServiceMonitors from the openshift-* and kube-* projects:

$ oc -n openshift-monitoring delete servicemonitor <name-of-servicemonitor>

Situation 3:
The namespace for which the log is streaming is actually a namespace hosting core OpenShift Container Platform components or Red Hat certified component.

If core OpenShift component / Red Hat certified component's namespace is missing the expected roles or rolebindings, please open a support case with Red Hat. These resources are expected to be present by default and their absence could be due to a bug in the relevant component.

Workaround

  • Make sure a role exists granting the correct permissions in the namespace:

    $ cat role.yaml
    --
    apiVersion: rbac.authorization.k8s.io/v1
    kind: Role
    metadata:
      name: prometheus-k8s
      namespace: <name-of-namespace>
    rules:
    - apiGroups:
      - ""
      resources:
      - services
      - endpoints
      - pods
      verbs:
      - get
      - list
      - watch
    
  • Make sure the rolebinding exists binding the previously created role to the serviceaccount system:serviceaccount:openshift-monitoring:prometheus-k8s in the namespace:

    $ cat rolebinding.yaml
    --
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      name: prometheus-k8s
      namespace: <name-of-namespace> 
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: Role
      name: prometheus-k8s
    subjects:
    - kind: ServiceAccount
      name: prometheus-k8s
      namespace: openshift-monitoring
    
  • Run the following commands and check the output to validate the role and rolebinding

    $ oc get role,rolebinding -n <namespace> | egrep "NAME|prometheus"
    NAME                                            CREATED AT
    role.rbac.authorization.k8s.io/prometheus-k8s   2023-07-31T20:44:45Z
    NAME                                                          ROLE                               AGE
    rolebinding.rbac.authorization.k8s.io/prometheus-k8s          Role/prometheus-k8s                15m
    
    $ oc get svc -n dummyapp --as=system:serviceaccount:openshift-monitoring:prometheus-k8s
    NAME    TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
    httpd   ClusterIP   172.30.184.120   <none>        8080/TCP,8443/TCP   20m
    nginx   ClusterIP   None             <none>        80/TCP              18h
    

Root Cause

  • The label openshift.io/cluster-monitoring: "true" is applied on user-defined namespace.
  • Additional user-defined ServiceMonitors are created in the openshift-* and kube-* projects.
  • The role resource granting privileges to monitor the namespace is missing, or the rolebinding resource assigned to the role of the serviceaccount system:serviceaccount:openshift-monitoring:prometheus-k8s is missing.

Diagnostic Steps

  • Logs from pod prometheus-k8s-0 in the namespace openshift-monitoring shows the following errors:

    $ $ oc logs -c prometheus prometheus-k8s-0 -n openshift-monitoring | grep "cannot list resource"
    <...>
    ts=2023-03-05T13:23:02.382Z caller=log.go:168 level=error component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:447: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"prometheus-external-monitoring\""
    ts=2023-03-05T13:23:14.475Z caller=log.go:168 level=error component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:448: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"services\" in API group \"\" in the namespace \"kuberhealthy\""
    ts=2023-03-05T13:23:18.367Z caller=log.go:168 level=error component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:447: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"dummyapp\""
    ts=2023-03-05T13:23:24.338Z caller=log.go:168 level=error component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:448: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"services\" in API group \"\" in the namespace \"prometheus-external-monitoring\""
    ts=2023-03-05T13:23:37.565Z caller=log.go:168 level=error component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:449: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" in the namespace \"dummyapp\""
    <...>
    
    • The Prometheus serviceaccount system:serviceaccount:openshift-monitoring:prometheus-k8s complains about the missing privileges to properly query the namespace dummyapp resources.
  • List the service resources in the namespace dummyapp while impersonating the Prometheus serviceaccount:

    $ oc get svc -n dummyapp --as=system:serviceaccount:openshift-monitoring:prometheus-k8s
    Error from server (Forbidden): services is forbidden: User "system:serviceaccount:openshift-monitoring:prometheus-k8s" cannot list resource "services" in API group "" in the namespace "dummyapp": RBAC: role.rbac.authorization.k8s.io "prometheus-k8s" not found
    

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments