v3.11 Configuring Clusters 3.2.22. Exposing Router Metrics

Latest response

I opened a case https://access.redhat.com/support/cases/#/case/02311827 to find out why haproxy router stats were not available in html format.

I believe the v3.11 ose-haproxy-router container has a bug and the documentation is inaccurate.

The documentation at https://access.redhat.com/documentation/en-us/openshift_container_platform/3.11/html-single/configuring_clusters/#exposing-the-router-metrics
and
https://docs.openshift.com/container-platform/3.11/install_config/router/default_haproxy_router.html#exposing-the-router-metrics

says to remove the environment variables from the router deploymentconfig:

    - name: ROUTER_LISTEN_ADDR
      value: 0.0.0.0:1936
    - name: ROUTER_METRICS_TYPE
      value: haproxy

A snippit of the default router deploymentconfig is:

spec:
  containers:
  - env:
    - name: ROUTER_LISTEN_ADDR
      value: 0.0.0.0:1936
    - name: ROUTER_METRICS_TYPE
      value: haproxy
    image: registry.redhat.io/openshift3/ose-haproxy-router:v3.11
    livenessProbe:
      failureThreshold: 3
      httpGet:
        host: localhost
        path: /healthz
        port: 1936
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    name: router
    ports:
    - containerPort: 80
      hostPort: 80
      protocol: TCP
    - containerPort: 443
      hostPort: 443
      protocol: TCP
    - containerPort: 1936
      hostPort: 1936
      name: stats
      protocol: TCP
    readinessProbe:
      failureThreshold: 3
      httpGet:
        host: localhost
        path: healthz/ready
        port: 1936
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1

I discovered that when you remove the environment variable from the router deploymentconfig:

    - name: ROUTER_METRICS_TYPE
      value: haproxy

This changes the router container to reply with html and the readinessProbe at path: healthz/ready does not reply with ok. It replies with the full stats html window(and it's access requires username/password). Because it fails to reply with ok, the router deployment will fail after 10 minutes and roll back to the previous router dc.

I changed the readinessProbe path to the same as the livenessProbe and used path: /healthz for BOTH livenessProbe & readinessProbe.

By making these two edits to the router deploymentconfig I was able to successfully redeploy a new set of routers that

I made my 2 changes to the default config and a router node now replies with:

[root@vmlxopencd01 ~]# curl vmlxopencd06.osb.spectrum-health.org:1936/healthz/ready

401 Unauthorized

You need a valid user and password to access this content.

That is not the expected reply from the default readinessProbe path.

[root@vmlxopencd01 ~]# curl vmlxopencd06.osb.spectrum-health.org:1936/healthz

200 OK

Service ready.

That is the expected reply from the livenessProbe path.

If I provide the admin:password to the default readinessProbe path it replies with the full haproxy html stats page:

[root@vmlxopencd01 ~]# curl admin:NmtUmwV7gq@vmlxopencd06.osb.spectrum-health.org:1936/healthz/ready

Statistics Report for HAProxy

I think the container is replying incorrectly to the default readinessProbe path when the environment variable ROUTER_METRICS_TYPE is not set to value: haproxy

-Paul VanAllsburg

Responses

oc set env dc/router STATS_USERNAME=admin STATS_PASSWORD=password -n default

Then you can run the follwing if you router has an ip of 10.10.92.102

curl admin:password http://10.10.92.102:1936/metrics

or

oc get --raw /metrics --server http://admin:password@10.10.92.102:1936