Chapter 13. Prometheus and Grafana metrics under Red Hat Quay

Red Hat Quay exports a Prometheus- and Grafana-compatible endpoint on each instance to allow for easy monitoring and alerting.

13.1. Exposing the Prometheus endpoint

13.1.1. Standalone Red Hat Quay

When using podman run to start the Quay container, expose the metrics port 9091:

$ sudo podman run -d --rm -p 80:8080 -p 443:8443  -p 9091:9091\
   --name=quay \
   -v $QUAY/config:/conf/stack:Z \
   -v $QUAY/storage:/datastorage:Z \
   registry.redhat.io/quay/quay-rhel8:v3.8.0

The metrics will now be available:

$ curl quay.example.com:9091/metrics

See Monitoring Quay with Prometheus and Grafana for details on configuring Prometheus and Grafana to monitor Quay repository counts.

13.1.2. Red Hat Quay Operator

Determine the cluster IP for the quay-metrics service:

$ oc get services -n quay-enterprise
NAME                                  TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                             AGE
example-registry-clair-app            ClusterIP   172.30.61.161    <none>        80/TCP,8089/TCP                     18h
example-registry-clair-postgres       ClusterIP   172.30.122.136   <none>        5432/TCP                            18h
example-registry-quay-app             ClusterIP   172.30.72.79     <none>        443/TCP,80/TCP,8081/TCP,55443/TCP   18h
example-registry-quay-config-editor   ClusterIP   172.30.185.61    <none>        80/TCP                              18h
example-registry-quay-database        ClusterIP   172.30.114.192   <none>        5432/TCP                            18h
example-registry-quay-metrics         ClusterIP   172.30.37.76     <none>        9091/TCP                            18h
example-registry-quay-redis           ClusterIP   172.30.157.248   <none>        6379/TCP                            18h

Connect to your cluster and access the metrics using the cluster IP and port for the quay-metrics service:

$ oc debug node/master-0

sh-4.4# curl 172.30.37.76:9091/metrics

# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 4.0447e-05
go_gc_duration_seconds{quantile="0.25"} 6.2203e-05
...

13.1.3. Setting up Prometheus to consume metrics

Prometheus needs a way to access all Red Hat Quay instances running in a cluster. In the typical setup, this is done by listing all the Red Hat Quay instances in a single named DNS entry, which is then given to Prometheus.

13.1.4. DNS configuration under Kubernetes

A simple Kubernetes service can be configured to provide the DNS entry for Prometheus.

13.1.5. DNS configuration for a manual cluster

SkyDNS is a simple solution for managing this DNS record when not using Kubernetes. SkyDNS can run on an etcd cluster. Entries for each Red Hat Quay instance in the cluster can be added and removed in the etcd store. SkyDNS will regularly read them from there and update the list of Quay instances in the DNS record accordingly.

13.2. Introduction to metrics

Red Hat Quay provides metrics to help monitor the registry, including metrics for general registry usage, uploads, downloads, garbage collection, and authentication.

13.2.1. General registry statistics

General registry statistics can indicate how large the registry has grown.

Metric nameDescription

quay_user_rows

Number of users in the database

quay_robot_rows

Number of robot accounts in the database

quay_org_rows

Number of organizations in the database

quay_repository_rows

Number of repositories in the database

quay_security_scanning_unscanned_images_remaining_total

Number of images that are not scanned by the latest security scanner

Sample metrics output

# HELP quay_user_rows number of users in the database
# TYPE quay_user_rows gauge
quay_user_rows{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="65",process_name="globalpromstats.py"} 3

# HELP quay_robot_rows number of robot accounts in the database
# TYPE quay_robot_rows gauge
quay_robot_rows{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="65",process_name="globalpromstats.py"} 2

# HELP quay_org_rows number of organizations in the database
# TYPE quay_org_rows gauge
quay_org_rows{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="65",process_name="globalpromstats.py"} 2

# HELP quay_repository_rows number of repositories in the database
# TYPE quay_repository_rows gauge
quay_repository_rows{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="65",process_name="globalpromstats.py"} 4

# HELP quay_security_scanning_unscanned_images_remaining number of images that are not scanned by the latest security scanner
# TYPE quay_security_scanning_unscanned_images_remaining gauge
quay_security_scanning_unscanned_images_remaining{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="208",process_name="secscan:application"} 5

13.2.2. Queue items

The queue items metrics provide information on the multiple queues used by Quay for managing work.

Metric nameDescription

quay_queue_items_available

Number of items in a specific queue

quay_queue_items_locked

Number of items that are running

quay_queue_items_available_unlocked

Number of items that are waiting to be processed

Metric labels

  • queue_name: The name of the queue. One of:

    • exportactionlogs: Queued requests to export action logs. These logs are then processed and put in storage. A link is then sent to the requester via email.
    • namespacegc: Queued namespaces to be garbage collected
    • notification: Queue for repository notifications to be sent out
    • repositorygc: Queued repositories to be garbage collected
    • secscanv4: Notification queue specific for Clair V4
    • dockerfilebuild: Queue for Quay docker builds
    • imagestoragereplication: Queued blob to be replicated across multiple storages
    • chunk_cleanup: Queued blob segments that needs to be deleted. This is only used by some storage implementations, for example, Swift.

For example, the queue labelled repositorygc contains the repositories marked for deletion by the repository garbage collection worker. For metrics with a queue_name label of repositorygc:

  • quay_queue_items_locked is the number of repositories currently being deleted.
  • quay_queue_items_available_unlocked is the number of repositories waiting to get processed by the worker.

Sample metrics output

# HELP quay_queue_items_available number of queue items that have not expired
# TYPE quay_queue_items_available gauge
quay_queue_items_available{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="63",process_name="exportactionlogsworker.py",queue_name="exportactionlogs"} 0
...

# HELP quay_queue_items_available_unlocked number of queue items that have not expired and are not locked
# TYPE quay_queue_items_available_unlocked gauge
quay_queue_items_available_unlocked{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="63",process_name="exportactionlogsworker.py",queue_name="exportactionlogs"} 0
...

# HELP quay_queue_items_locked number of queue items that have been acquired
# TYPE quay_queue_items_locked gauge
quay_queue_items_locked{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="63",process_name="exportactionlogsworker.py",queue_name="exportactionlogs"} 0

13.2.3. Garbage collection metrics

These metrics show you how many resources have been removed from garbage collection (gc). They show many times the gc workers have run and how many namespaces, repositories, and blobs were removed.

Metric nameDescription

quay_gc_iterations_total

Number of iterations by the GCWorker

quay_gc_namespaces_purged_total

Number of namespaces purged by the NamespaceGCWorker

quay_gc_repos_purged_total

Number of repositories purged by the RepositoryGCWorker or NamespaceGCWorker

quay_gc_storage_blobs_deleted_total

Number of storage blobs deleted

Sample metrics output

# TYPE quay_gc_iterations_created gauge
quay_gc_iterations_created{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="208",process_name="secscan:application"} 1.6317823190189714e+09
...

# HELP quay_gc_iterations_total number of iterations by the GCWorker
# TYPE quay_gc_iterations_total counter
quay_gc_iterations_total{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="208",process_name="secscan:application"} 0
...

# TYPE quay_gc_namespaces_purged_created gauge
quay_gc_namespaces_purged_created{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="208",process_name="secscan:application"} 1.6317823190189433e+09
...

# HELP quay_gc_namespaces_purged_total number of namespaces purged by the NamespaceGCWorker
# TYPE quay_gc_namespaces_purged_total counter
quay_gc_namespaces_purged_total{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="208",process_name="secscan:application"} 0
....

# TYPE quay_gc_repos_purged_created gauge
quay_gc_repos_purged_created{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="208",process_name="secscan:application"} 1.631782319018925e+09
...

# HELP quay_gc_repos_purged_total number of repositories purged by the RepositoryGCWorker or NamespaceGCWorker
# TYPE quay_gc_repos_purged_total counter
quay_gc_repos_purged_total{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="208",process_name="secscan:application"} 0
...

# TYPE quay_gc_storage_blobs_deleted_created gauge
quay_gc_storage_blobs_deleted_created{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="208",process_name="secscan:application"} 1.6317823190189059e+09
...

# HELP quay_gc_storage_blobs_deleted_total number of storage blobs deleted
# TYPE quay_gc_storage_blobs_deleted_total counter
quay_gc_storage_blobs_deleted_total{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="208",process_name="secscan:application"} 0
...

13.2.3.1. Multipart uploads metrics

The multipart uploads metrics show the number of blobs uploads to storage (S3, Rados, GoogleCloudStorage, RHOCS). These can help identify issues when Quay is unable to correctly upload blobs to storage.

Metric nameDescription

quay_multipart_uploads_started_total

Number of multipart uploads to Quay storage that started

quay_multipart_uploads_completed_total

Number of multipart uploads to Quay storage that completed

Sample metrics output

# TYPE quay_multipart_uploads_completed_created gauge
quay_multipart_uploads_completed_created{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="208",process_name="secscan:application"} 1.6317823308284895e+09
...

# HELP quay_multipart_uploads_completed_total number of multipart uploads to Quay storage that completed
# TYPE quay_multipart_uploads_completed_total counter
quay_multipart_uploads_completed_total{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="208",process_name="secscan:application"} 0

# TYPE quay_multipart_uploads_started_created gauge
quay_multipart_uploads_started_created{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="208",process_name="secscan:application"} 1.6317823308284352e+09
...

# HELP quay_multipart_uploads_started_total number of multipart uploads to Quay storage that started
# TYPE quay_multipart_uploads_started_total counter
quay_multipart_uploads_started_total{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="208",process_name="secscan:application"} 0
...

13.2.4. Image push / pull metrics

A number of metrics are available related to pushing and pulling images.

13.2.4.1. Image pulls total

Metric nameDescription

quay_registry_image_pulls_total

The number of images downloaded from the registry.

Metric labels

  • protocol: the registry protocol used (should always be v2)
  • ref: ref used to pull - tag, manifest
  • status: http return code of the request

13.2.4.2. Image bytes pulled

Metric nameDescription

quay_registry_image_pulled_estimated_bytes_total

The number of bytes downloaded from the registry

Metric labels

  • protocol: the registry protocol used (should always be v2)

13.2.4.3. Image pushes total

Metric nameDescription

quay_registry_image_pushes_total

The number of images uploaded from the registry.

Metric labels

  • protocol: the registry protocol used (should always be v2)
  • pstatus: http return code of the request
  • pmedia_type: the uploaded manifest type

13.2.4.4. Image bytes pushed

Metric nameDescription

quay_registry_image_pushed_bytes_total

The number of bytes uploaded to the registry

Sample metrics output

# HELP quay_registry_image_pushed_bytes_total number of bytes pushed to the registry
# TYPE quay_registry_image_pushed_bytes_total counter
quay_registry_image_pushed_bytes_total{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="221",process_name="registry:application"} 0
...

13.2.5. Authentication metrics

The authentication metrics provide the number of authentication requests, labeled by type and whether it succeeded or not. For example, this metric could be used to monitor failed basic authentication requests.

Metric nameDescription

quay_authentication_attempts_total

Number of authentication attempts across the registry and API

Metric labels

  • auth_kind: The type of auth used, including:

    • basic
    • oauth
    • credentials
  • success: true or false

Sample metrics output

# TYPE quay_authentication_attempts_created gauge
quay_authentication_attempts_created{auth_kind="basic",host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="221",process_name="registry:application",success="True"} 1.6317843039374158e+09
...

# HELP quay_authentication_attempts_total number of authentication attempts across the registry and API
# TYPE quay_authentication_attempts_total counter
quay_authentication_attempts_total{auth_kind="basic",host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="221",process_name="registry:application",success="True"} 2
...